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ADDITIOML  NOTES,  193? 

Demlng  and  Blrge,  "On  the  statistical  theory  of  errors", 
Reviews  of  Modern  Physics  6,  119-161  (July  1934). 


After  using  the  material  in  this  publication  for 
three  years,  and  watching  further  developments  in  sta- 
tistical inference,  the  authors  welcome  an  opportunity 
to  attach  a few  notes  to  this  1937  reprinting.  It  is 
heurdly  worth  while  to  bring  the  entire  paper  into  lino 
with  our  present  ideas;  it  will  be  sufficient  to  discuss 
briefly  some  of  the  main  changes  in  exposition  that 
might  be  helpful. 

Regarding  section  3c,  on  statistical  tests. 

Professor  Fisher  has  kindly  commented  on  some 
parts  of  our  treatment;  see  the  letter  attached.  His 
remarks  on  the  z test  are  especially  illuminating. 

Recent  researches  of  J.  Neyman  and  Egon  Pearson  in- 
quiring into  the  circumstances  that  need  to  be  taken 

into  account  when  selecting  a suitable  statisticeil 
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test  should  also  be  mentioned.  Briefly,  the  argument 
runs  something  like  this:  If  we  are  sampling  always 
from  a population  with  ineanH-,  then  the  chance  of 
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unjustly  re  jecting  depends  simply  upon  the  size 
of  the  region  of  rejection,  and  not  at  all  on  its 
shape  or  position;  u or  z contours,  or  any  others, 
curved  or  straight,  could  serve  indistinguishably  as 
boundaries  of  such  a region.  Obviously,  the  risk  of 
rejecting  can  be  reduced  to  zero  by  diminishing  to 
zero  the  size  of  the  region  of  rejection,  of  whatever 
shape  or  posit ion~in  other  words,  by  the  rule  of  al- 
ways accepting  Hq.  And  this  we  should  certainly  do  if 
the  sampling  were  known  to  be  always  from|J.o;  no  test 
of  any  kind  would  then  be  needed  or  considered,  no 
matter  what  the  sample  mean  x. 

If  we  do  not  always  accept  p.^,  it  is  because  an 
alternative  mean|ii,  or  a whole  class  of  alternatives, 
is  considered  a possibility.  It  is  then  that  a test 
for  (Iq  becomes  important.  We  need  to  take  a region  of 
rejection  of  finite  size  (as  0.01  or  0.06),  so  shaped 
and  placed  that  if  is  the  true  mean,  the  sample 
point  will  have  the  best  possible  chemce  of  falling 
into  it,  €ind  the  mean  thus  be  rejected.  The  shape 


41.  See  the  Statistical  Research  Memoirs  vol.  1,  ed- 
ited by  J.  Neyman  and  Egon  S.  Pearson,  published  in 
1936  by  Biometrika,  University  College,  London,  W.  C.  1. 

42.  The  size  of  a region  of  rejection  for  the  hypo- 
thesis that  Pq  is  the  population  mean  is  the  chance  of 
a sample  point  falling  into  this  region,  this  chance 
being  calculated  on  the  assumption  that  Pq  is  the  true 
mean. 


and  position  of  the  region  of  rejection  need  to  be 
chosen  with  due  regard  to  whether  Pj  '^P^.  or 
V-l  > Pq,  or  Pj^  Pq;  also  whether  a is  known  or  not. 
In  the  circumstance  that  o is  known  and  Pi>  Pq» 
the  best  region  of  rejection  is  the  shaded  area  to 
the  right  of  a u contour  like  BB  in  Fig.  10a.  But 
if  0 is  not  known,  the  best  region  of  rejection  is 
to  the  right  of  a z contour  like  OD  in  Fig.  10c. 
(Replace  BB  by  AA,  and  OD  by  00  if  Pi<  p^;  use 
both  AA  and  BB,  and  OD  and  00,  if  p ^^  l^o^*  Thus 
there  is  really  no  disagreement  between  the  u and  z 
tests,  because  in  practice  they  would  never  bo  ap- 
plicable at  the  same  time;  each  is  supreme  in  its 
own  sphere  of  circumstances. 

Regarding  the  nomograph,  p.  136. 

The  nomograph  of  Nekrassoff  is  very  convenient 
for  Fisher's  t test,  whether  for  one  sample  or  a 
pair  of  samples.  The  scale  for  z yn-  1 is  really 
the  scale  for  Fisher’s  t;  and  the  scale  for  n,  if 
the  numbers  were  all  reduced  one  unit,  would  be  the 
scale  for  Fisher’s  n (degrees  of  freedom).  The 
scale  is  then  Fisher’s  P for  the  t test. 

Regarding  section  3e,  on  fiducial  limits. 

In  our  treatment  of  flducially  related  values  of 
s and  o it  was  stated  that  there  are  but  5 chances  in 
100  that  o(s,5)  is  less  than  the  S.D.  of  the  parent 
population.  o(s,5)  is  easily  calculated  from  Table 
III  (page  143)  for  a given  value  of  s.  Now  it  is  es- 
sential to  Tinderstand  in  Just  what  sense  "5  chances 
in  100"  and  "the  given  value  of  s"  are  to  be  taken. 
There  must  be  no  selection  of  values  of  s;  the  rule 
for  calculating  o(s,6)  must  be  followed  consistently 
for  all  values  of  s in  one’s  entire  sampling  ex- 
perience, whether  or  not  o varies  from  one  sample  to 
another,  and  whether  or  not  any  such  variation  is 
known  to  exist.  Under  the  assumption  that  the  sam- 
pling distribution  of  s in  samples  from  populations 
having  any  given  value  of  a follows  the  Helmert  curve 
(Eg.  14),  it  can  then  be  said  that  on  the  average 
only  1 in  20  of  the  values  of  o(s,5)  will  be  less 
than  the  S.D.  of  the  sampled  population.  But  if 
o(s,5)  were  calculated  only  whenever  s fell  within 
some  previously  selected  range,  no  such  statement 
would  hold.  We  are  indebted  to  Dr.  T.  Koopmans  of 
Amsterdam  and  to  Professor  Egon  Pearson  for  pointing 
out  our  failure  to  note  this  point  in  our  original 
exposition. 


It  would  be  well  to  mention  bere  also  a simple 
diagram  for  showing  the  fiducial  relation  between  s 
and  o.  In  Fig.  16  the  horizontal  line  at  ordinate 
a is  shaded  proportional  to  the  density  of  samples 


Fig.  16.  Diagram  for  finding  a(s,5).  In  Ney- 
man's  terminology,  OA  forms  a 'confidence 
belt'  with  the  s axis  of  'confidence  coefficient' 
0.95.  The  slope  of  OA  depends  on  n;  see 
Table  HI. 

following  a Helmert  distribution  of  s,  for  a chosen 
veilue  of  n.  Five  percent  of  the  values  of  s lie  to 
the  left  of  B and  95  percent  to  the  right.  Now  sup- 
pose that  in  a sample  of  n the  S.  D.  is  found  to  be 
s.  Let  the  distance  s be  laid  off  from  0 on  the 
horizontal  axis.  A vertical  line  through  s then 
outs  OA  at  D,  and  the  ordinate  of  D is  o(s,5). 

Whenever  s falls  to  the  right  of  C,  as  pictured, 
and  as  happens  in  95  percent  of  the  samples,  then 
a(s,5)  > o.  But  if  s had  fallen  to  the  left  of  C, 
as  will  happen  in  5 percent  of  the  samples,  then  we 
should  have  found  o(s,5)  < o.  These  statements 
hold  no  matter  what  a may  be,  known  or  unknown,  and 
whether  o be  constant  or  not  from  one  sample  to  an- 
other. This  is  the  sense  in  which  the  chances  are 
19  : 1 that  a random  value  of  o(s,5)  > a. 

The  same  probability  statement  could  as  well  be 
made  concerning  a random  fiducial  limit  properly  cal- 
culated from  an  insufficient  statistic.  However,  if 
a sufficient  statistic  can  be  found,  as  is  the  case 
in  the  problem  here  considered,  a fiducial  limit 
calculated  otherwise  would  be  of  no  interest.  Thus, 
once  o(s,5)  is  calculated,  we  should  not  have  the 
slightest  concern  for  a fiducial  limit  calculated 
from  the  mean  deviation  ■^S|x^-x|  . This  circum- 
stance undoubtedly  explains  the  apparent  contradic- 
tion with  the  last  paragraph  of  the  attached  letter. 


A sample,  no  matter  what  o be,  is  represented 
by  a point  somewhere  in  the  plane  of  Fig.  16.  Ob- 
viously, if  one  were  to  calculate  o(s,5)  only  for 
the  subclass  of  samples  falling  within  the  vertical 
dashed  lines  of  Fig.  16,  no  probability  statement 
could  be  made  without  knowing  the  prior  curve  0(o)j 
see  p.  161.  This  is  in  other  words  the  statement 
attributed  to  Dr.  Koopmans  four  paragraphs  bacTc- 

It  is  Interesting  to  note  that  the  first  pub- 
lished statement  on  the  notion  of  fiducial  or  con- 
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fidence  limits  was  made  by  E.  B.  Wilson  in  1927. 

He  dealt  with  confidence  limits  for  samples  from  a 
discrete  universe,  but  without  introducing  a name 
for  them. 

Regarding  sections  4b  euid  4c,  on  estimates  of  j. 

The  terms  biased  and  unbiased  have  been  intro- 
duced in  describing  estimates  of  parameters;  e.  g. 
s®n/(n-li,  as  given  in  Eq.  (36),  is  an  unbiased  esti- 
mate of  o®,  because  the  average  value  of  s®n/(n-l) 
in  a Helmert  distribution  is  a®.  Likewise  the  mean 
estimate  s(o/s) , Eq. (38) , is  an  unbiased  estimate  of 
a.  But  3 /n/Tn-l)  is  a biased  estimate  of  o,  and 
[s(o/5)]®  is  a biased  estimate  of  a®.  The  factor 
n/(n-l)  in  place  of  tmity  was  ascribed  on  p.  145  to 
Bessel;  however,  it  was  more  likely  Gauss  or  his  pu- 
pil Encke^  who  first  recognized  the  need  of  correct- 
ing for  the  loss  of  one  degree  of  freedom  for  every 
condition  imposed  by  the  adjustment  if  one  would 
secure  what  we  now  call  an  \inbiased  estimate  of  <T®. 

In  contrast  with  other  estimates,  the  median  estimate 
(p.  146)  of  o is  invariant,  whether  made  from  the 
median  of  the  distribution  of  s,  or  s®,  or  any  func- 
tion of  s.  We  are  indebted  to  Dr.  Alan  E.  Treloar 
for  this  interesting  fact.  A recent  article  by 
Pitman^^  on  "closest"  estimates  discusses  some  other 
properties  of  the  median  estimate. 
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Two  papers,  one  by  Paul  R.  Rider,  the  other 
by  J.  F.  Tocher, had  been  used  with  profit  by  the 
authors,  but  were  inadvertently  omitted  in  the 
citations  to  literature. 

It  is  a pleasure  to  acknowledge  the  assistance 
of  Dr.  Samiuel  S.  Wilks,  of  Princeton  on  several  points. 

W.  Edwards  Deming 
Raymond  T.  Birge 
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A Letter  to  the  Editor  of  The  Physical  Review 


On  the  Statistical  Theory  of  Errors 


Professor  R.  A.  Fisher  has  most  kindly  responded  to  our 
request  for  criticisms  of  the  article  that  appeared  under  the 
above  title.'  The  material  in  his  letter  is  much  too  valuable 
to  be  filed  away,  so  with  his  consent  we  here  present  the 
substance  of  his  comments,  together  with  some  additions 
here  and  there  of  our  own. 

It  is  doubtful  if  on  page  135  it  was  made  sufficiently  clear 
that  in  the  absence  of  a reliable  estimate  of  a,  the  u test 
cannot  be  used,  and  that  the  z test  (which  is  equivalent  to 
Fisher  and  Student’s  t test)  is  the  only  recourse.  (By  a 
reliable  estimate  of  <x  we  mean  an  estimate  that  is  consider- 
ably more  reliable  than  can  be  obtained  from  the  single 
sample  under  test.)  The  z test  is  not  inherently  misleading; 
it  tests  objectively  a proposed  value  of  z,  and  for  this 
purpose  it  is  of  course  perfectly  valid  (as  we  say).  Like  any 
statistical  test,  the  z test  lays  down  and  accepts  a perfectly 
definite  hazard.  Misinterpretations  of  the  z test  may  be 
common,  but  the  blame  should  be  placed,  not  on  the  test 
itself,  but  on  misunderstandings  of  the  nature  that  we 
point  out  on  page  135.  What  is  more  to  be  feared  than 
over-confidence  in  the  z test  is  the  use  of  the  normal  prob- 
ability integral  (the  u test)  with  an  estimate  of  <r  based 
on  the  single  sample  under  test. 

The  separation  of  the  parameters  of  the  parent  popula- 
tion from  estimates  of  these  parameters  has  been  a gradual 
process.  Many  writers  have  been  extremely  careless  in 
confusing  that  which  is  estimated  with  an  estimate  of  it. 
Thinking  to  avoid  any  such  looseness,  we  systematically 
used  Greek  and  Latin  letters  to  distinguish  the  two-classes 
of  quantity.  It  is  perhaps  well  to  go  even  further  and  use 
distinguishing  names  for  the  two  classes.  For  this  purpose 
there  are  in  use  today  the  terms  "parameter”  of  the  parent 
population  and  “statistic”  of  the  sample,  the  word 
“statistic”  having  been  introduced  by  Fisher  (footnote  4 
of  our  article)  in  1921  to  fill  the  need  of  a term  antithetical 
to  “parameter.”  A parent  population  is  completely  specified 


by  its  one  or  two  or  more  parameters,  but  a sample  of  >i 
would  require  n different  statistics  for  its  specification.  To 
each  of  these  statistics  there  corresponds  a particular 
parameter  or  parametric  function  toward  which  the  value 
of  the  statistic  tends  as  the  sample  is  indefinitely  increased : 
but  to  each  parameter  there  “corresponds,”  in  this  sense, 
as  many  statistics  as  there  can  be  of  samples  from  a given 
parent  population,  to  which  number  there  is  no  limit. 
For  these  reasons  it  would  doubtless  have  been  better  to 
have  written  “corresponding  statistic  of  a sample”  on  page 
142,  7 lines  below  section  (3e),  to  avoid  giving  the  im- 
pression that  there  is  a one  to  one  correspondence  between 
the  two  quantities  i and  a. 

In  further  connection  with  fiducial  probability  it  should 
be  mentioned  that  fiducial  values  can  be  taken  only  from 
distributions  of  statistics  that  contain  the  whole  of  the 
information  that  can  be  obtained  from  the  sample.  The 
distribution  of  5 fulfills  this  requirement,  and  our  discussion 
of  fiducially  related  values  of  <r  and  s is  therefore  valid,  but 
it  is  worth  while  to  note  that  the  distribution  of,  for 
example,  the  arithmetic  mean  deviation,  from  which 
Peters'  formula  (see  any  text  on  least  squares)  is  derived, 
could  not  be  so  used.  There  is  not  room  here,  and  neither 
was  there  in  the  original  article,  to  discuss  the  criteria  of 
“efficiency”  and  “sufficiency,”  but  they  might  at  least  be 
mentioned  with  a reference.  The  reader  will  find  them 
discussed  in  the  papers  cited  in  footnotes  4 and  31. 

W.  Edwards  Deming, 

Bureau  of  Chemistry  and  Soils, 
Washington,  D.  C. 

Raymond  T.  Birge, 

University  of  California 
Berkeley. 

November  9,  1934. 

> Doming  and  Birge.  Rev.  Mod.  Phys.  6,  (19.^4). 
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v,n 

the  number  of  items  in  the  parent  population,  and  the  number 
of  items  in  the  sample. 

123 

the  means  of  the  parent  population  and  of  the  sample. 

125,  126 

<y,s 

the  square  roots  of  the  second  moments  about  the  means,  or  the 
standard  deviations  (S.D.),  of  the  parent  population  and  of  the 
sample. 

125 

the  observations  constituting  a sample. 

125 

€i^Xi—fX 

true  errors. 

125 

Vi=Xi—X 

residuals. 

126 

u =X  — fjL 

true  error  of  a sample. 

125 

f> 

probable  error  (p.e.)  of  a single  observation. 

127 

r 

p.e.  of  the  mean  of  n observations. 

126 

r.m.s. 

root  mean  square. 

ni 

N 

an  indefinitely  large  number  of  samples  of  n drawn  from  the 
parent  population. 

128 

s 

the  mode  of  the  sampling  distribution  of  s,  Eq.  (14),  Helmert’s 
equation. 

128 

s 

the  mean  of  the  sampling  distribution  of  s. 

128 

s = <r/f 

the  median  of  the  sampling  distribution  of  s.  This  defines  /. 
<T  would  be  the  median  on  the  sampling  distribution  of  fs. 

128 

B{m,n) 

the  beta  function  x’"~^(l  —x)’‘~’dx.  The  arguments  m and  n 

are  interchangeable. 

129 

r.(w) 

the  incomplete  gamma  function  x'‘~^e~^dx. 

129 

r(n) 

the  complete  gamma  function  (the  same  integral  with  limits  0 
to  00  ). 

129 

z 

=u/s,  the  true  error  of  the  mean  in  units  of  the  S.D.  5 of  the 
sample  (abscissa  of  Student’s  distribution,  Eq.  (21)). 

132 

5 

defines  a contour  of  arbitrary  constant  altitude  on  the  u,s  fre- 
quency surface. 
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132 

X 

defines  a contour  on  the  u,s  frequency  surface  along  which  the 
probability  of  a given  set  of  errors  bears  the  constant  ratio  X to 
the  maximum  value  that  this  probability  can  attain. 

132 

Pu 

probability  of  drawing  a sample  with  true  error  greater  than  \u\. 

133 

7 

denotes  0.674  • • • / V 2 = 0.476936276  • • • . 

133 

P> 

probability  of  drawing  a sample  with  S.D.  greater  than  s. 

133 

argument  in  the  chi-test. 

135 

Pz 

probability  of  drawing  a sample  with  \u/s  \ = \z\  greater  than  a 
specified  value. 

137 

A 

probability  of  drawing  a sample  lying  outside  a specified  X 
contour  of  the  u,s  frequency  surface. 

137 

Ps 

probability  of  drawing  a sample  lying  outside  a specified  S 
contour  of  the  u,s  frequency  surface. 

140 

r 

value  of  |z|  for  which  Pz  = h,  i.e.,  the  quartile  deviation  in 
Student’s  distribution  of  z in  samples  of  n. 

141 

denotes  the  factor  0.674-  • -//Vw.  r would  be  the  median  on  the 
sampling  distribution  of  05. 

142 

<r{s,5) 

the  5 percent  fiducial  value  of  a corresponding  to  a given  value 
of  s.  There  is  1 chance  in  20  that  the  S.  D.  of  the  parent  population 
is  greater  than  cr(j,5)  for  the  given  value  of  s. 

142 

s(a,95) 

the  95  percent  fiducial  value  of  s corresponding  to  a given  value 
of  <T.  There  are  19  chances  in  20  that  the  S.  D.  of  the  sample  is 
greater  than  5(<r,95)  for  the  given  value  of  cr. 

143 

r(s,5) 

the  5 percent  fiducial  value  of  r corresponding  to  a given  value 
of  s.  There  is  1 chance  in  20  that  the  probable  error  of  the  mean 
of  n observations  is  greater  than  r(5,5)  —<t>obS. 

142 

fiS 

that  particular  value  of  <r/s  designated  by  a-{s,5)/s  or  by  o-/i(o-,95). 

143 

096 

denotes  the  factor  0.674- • -/ss/Vw;  <t>siS  = r(s,5). 

143 

fbO 

the  same  as/;  the  subscript  50  is  used  for  emphasis,  especially  in 
discussing  50  percent  fiducial  values  of  a and  s. 

143 

060 

the  same  as  <j>;  the  subscript  50  is  used  for  emphasis,  especially 
in  discussing  50  percent  fiducial  values  of  <r  and  s. 

145 

<T8,  f e 

estimates  of  a and  of  r derived  from  the  sample  alone. 

147 

<j>S 

some  multiple  of  5,  denoting  an  estimate,  o-j,  made  from  a sample. 

148 

F 

the  r.m.s.  error  in  an  estimate  of  er,  in  uhits  of  <r,  or  the  r.m.s. 
error  in  an  estimate  of  r,  in  units  of  r,  both  of  which  are  equal  to 
the  estimated  proportional  r.m.s.  error  in  an  estimate  of  cr  or  of  r. 

150,  151 

0(<r) 

the  ordinate  at  the  abscissa  a on  a prior  existence  curve  for  the 
S.  D.  of  the  parent  population. 

150,  151 

p or  p{(r) 

the  ordinate  at  the  abscissa  <r  on  a posterior  curve  for  the  S.  D. 
of  the  parent  population. 

154 

So 

an  observed  S.  D.  in  a sample  of  n. 

154 

rq 

the  "posterior  quartile  deviation”  of  u,  the  quartile  deviation  at 
the  section  j = const,  on  the  posterior  probability  surface  for  u 
and  s. 

155 

9(m) 

the  ordinate  at  the  abscissa  /x  on  a prior  existence  curve  for  the 
mean  of  the  parent  population. 

155 

a,  h,  c 

adjustable  parameters  in  Molina  and  Wilkinson’s  forms  of  prior 
existence  curves  for  n and  <r. 

156 

T 

denotes  «-t-2+c-t-ii  in  Molina  and  Wilkinson’s  curves. 

156 

q{u) 

the  ordinate  at  the  abscissa  u on  the  posterior  surface  for  u and  s, 
taken  at  the  section  5 = const. 

156 

t 

the  quartile  deviation  on  Student’s  curve  when  n is  replaced  by 
P =n~\-'l~\-C'^b, 

156 

r,(50),  r,(80),\ 

the  50,  80,  90  and  99.73  percentile  deviations  at  the  section 

r,(90),  r, (99.73)/  s = const.  on  the  posterior  probability  surface  for  u and  J.  f,  is 
generally  used  in  place  of  r,(50). 

m the  number  of  observations  on  the  mean  (7=1,2,  • • •,  m). 
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158 

the  mean  of  the  parent  population  from  which  the  Wi  observations 
constitute  a sample  (f  = 1,  2,  • • •,  m).  The  m means  uu  • • •,  Mm 

may  or  may  not  all  be  distinct,  but  the  m parent  populations 
all  have  the  same  S.D.  a. 

158 

m 

a finite  number  of  samples  or  series  of  observations  on  means 
that  may  all  be  distinct  and  in  which  n may  vary  from  one 
sample  to  another,  but  for  which  a is  constant. 

158 

Ui 

the  error  in  the  mean  of  the  m observations  on  the  mean  yui. 

158 

Si 

the  S.D.  of  these  w;  observations. 

§1.  Introduction 

SOME  of  the  recent  advances  in  probability 
and  mathematical  statistics  throw  con- 
siderable light  on  the  theory  of  errors.  Problems 
that  arise  in  drawing  conclusions  from  observa- 
tions are  essentially  statistical  and  should  be 
handled  as  such.  Unfortunately  the  literature  on 
statistics  has  received  but  scant  notice  from 
writers  of  treatises  on  errors.  In  the  present 
paper  we  shall  attempt  to  put  the  pertinent 
results  of  statistics  in  such  a form  that  they  will 
be  useful  for  the  interpretation  of  physical  data. 

Pursuit  of  the  theory  of  errors  is  often  con- 
sidered to  be  futile  for  the  reason  that  systematic 
errors,  suspected  or  unsuspected,  may  be  so 
large  as  to  eclipse  any  accidental  error  likely  to 
occur.  It  Is  true  that  a statistical  treatment  of  the 
data  obtained  from  a single  experiment  per- 
formed under  controlled  conditions  can  never 
disclose  the  systematic  errors  in  that  one  experi- 
ment. It  is  only  by  comparing  the  results  - of 
several  observers  that  it  is  possible  to  form  some 
idea  as  to  whether  all  observers  were  really 
measuring  the  same  thing  or  if,  on  the  contrary, 
the  systematic  errors  present  in  one  experiment 
were  different  from  those  in  the  others.  Such 
comparisons  are  possible  only  when  the  data  of 
each  observer  have  been  correctly  treated, 
statistically,  on  the  assumption  that  all  system- 
atic corrections  have  been  eliminated.  For  this 
reason  a working  knowledge  of  the  theory  of 
errors  is  indispensable  to  the  interpretation  of 
experimental  data.  The  detection  of  systematic 
errors  by  statistical  analysis  has  been  discussed 
and  applied  by  one  of  the  writers.^ 


‘ Raymond  T.  Birge,  Phys.  Rev.  40,  207-227  (1932);  40, 
228-261  (1932). 


The  branch  of  statistics  that  concerns  the 
theory  of  errors  is  called  “sampling”  or  “the 
theory  of  small  samples.”  The  object  of  sampling 
is  to  make  possible  an  estimation  of  the  magni- 
tude and  variability  of  some  measureable  prop- 
erty of  a very  large  number  of  items  by  testing 
only  a portion  of  them.  From  the  measurements 
of  the  individuals  in  a random  sample  of  5,  10, 
20,  30  or  more  items,  and  from  previous  ex- 
perience with  similar  items,  some  estimate  of  the 
mean  of  the  measureable  magnitude  and  of  its 
variability  in  the  entire  lot  can  be  made  by 
statistical  methods  of  induction.  The  confidence 
that  one  may  place  in  such  an  estimate  depends 
on  the  size  of  the  sample  and  on  previous 
experience  with  similar  items,  when  such  ex- 
perience is  available.  Complete  confidence  or 
certainty  can  only  be  approached  as  a limit  by 
indefinitely  increasing  the  size  of  the  sample.  No 
guarantee  can  be  made  beforehand  as  to  how 
large  the  sample  must  be  in  order  that  an 
estimate  shall  lie  within  a specified  amount  from 
the  true  value^;  however,  it  may  be  possible  to 
lay  odds  beforehand  that  an  estimate  will  fall 
within  the  specified  range.  The  theory  of  sam- 
pling furnishes  both  the  methods  of  estimation 
and  the  odds. 

A “frequency  curve”  is  a curve  so  constructed 
that  the  area  included  between  two  abscissas  is 
equal  to  the  number  of  items  having  a measured 
quality  lying  within  the  range  defined  by  these 
abscissas.  Since  the  area  of  any  strip  must  be 
integral  and  therefore  finite,  even  though  the 


^ The  reader  may  consult  J.  M.  Keynes,  A Treatise  on 
Probability,  Ch.  29  (Macmillan,  1921);  W.  A.  Shewhart, 
The  Economic  Control  of  Quality,  pp.  362,  438  (Van 
Nostrand,  1931);  Thornton  C.  Fry,  Probability,  Ch.  3 
(Van  Nostrand,  1928);  M.  S.  Bartlett,  Proc.  Roy.  Soc. 
A141,  518-534  (1933),  especially  pages  520  and  521. 
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abscissas  differ  only  infinitesimally,  it  is  clear 
that  the  total  area  under  any  frequency  curve 
must  be  infinite  and  that  its  actual  construction 
would  require  an  unattainable  number  of 
measurements.  A frequency  curve  therefore  is  an 
attribute  of  a hypothetical  and  indefinitely  large 
aggregate,  known  by  the  term  “parent  popula- 
tion.’’ An  actual  sample,  no  matter  how  large,  is 
finite,  and  therefore  will  have  not  a frequency 
curve  but  a frequency  polygon. 

As  the  size  of  the  sample  is  indefinitely  in- 
creased and  the  “class  interval”  along  the 
abscissa  indefinitely  decreased,  the  frequency 
polygon  of  the  sample  approaches  the  frequency 
curve  of  the  parent  population  from  which  it  is 
drawn.  The  parent  population  and  its  frequency 
curve  have  the  same  objective  existence  as  any 
statistical  limit;  hence  they  can  be  approached  to 
any  desired  degree  by  the  two  expedients  (a) 
taking  a large  enough  sample,  and  (b)  refining 
the  measurements  so  that  enough  figures  are 
recorded  for  each  item  to  allow  a sufficiently 
small  class  interval. 

In  the  theory  of  errors  a set  of  n equally  reliable 
observations  may  be  considered  as  a sample  of  n 
drawn  at  random  from  an  indefinitely  large 
number  v of  observations  that  might  be  made  if 
time  and  opportunity  would  permit  and  if  the 
apparatus  would  not  wear  out.  This  hypothetical 
aggregate  will  be  the  parent  population  in  the 
problem. 

If  there  were  no  systematic  errors  present,  the 
mean  of  the  parent  population  would  be  the  true 
value  of  the  quantity  being  measured.  The 
effect  of  a systematic  error  is  to  displace  the  mean 
of  the  parent  population  of  observations  above  or 
below  the  true  value.  This  correction,  if  ever 
isolated  and  evaluated,  can  be  added  to  or 
subtracted  from  the  mean  of  the  parent  popula- 
tion to  give  the  true  value. 

The  object  of  making  the  n observations  is  to 
estimate  what  would  be  obtained  for  the  mean  of 
an  indefinitely  large  number  of  observations;  in 
other  words,  the  object  is  to  estimate  the 
position  of  the  mean  of  the  parent  population. 
Its  exact  value  remains  unknown  because  n is 
finite.  As  our  hopes  vanish  of  ever  knowing 
exactly  the  mean  of  the  parent  population,  we 
become  increasingly  interested  in  the  number  of 
significant  figures  in  the  estimate.  That  is,  if  x is 


an  estimate  of  the  mean  u of  the  parent  popula- 
tion, we  should  like  to  know  what  is  the  chance 
that  X differs  from  /.t  by  a stated  amount.  On  the 
basis  of  certain  assumptions  regarding  the  form 
of  the  parent  population,  the  study  of  statistics 
furnishes  the  answers  to  this  question  and  to 
several  others  that  arise. 

The  true  value  of  the  quantity  being  measured 
is  approached  by  correcting  for  systematic  errors, 
one  after  another.  The  effect  of  accidental  errors 
can  be  reduced  as  far  as  desired  by  taking  enough 
observations.  The  measurement  of  each  system- 
atic correction  presents  a problem  in  statistics, 
for  a correction  cannot  be  intelligently  applied 
unless  its  precision  is  stated. 

§2.  The  Specification  of  the 
Parent  Population 

The  frequency  curve  for  the  parent  population 
will  be  assumed  “normal.”  There  are  several 
reasons  for  this  choice.  In  the  first  place,  for  error 
theory  the  normal  curve  is  nearly  always  an 
excellent  approximation.  Furthermore,  several 
investigations  on  non-normal  populations  have 
shown  that  even  considerable  departures  from 
normality  do  not  produce  appreciable  changes  in 
many  important  deductions  based  on  the  normal 
curve.  It  has  also  been  established  that  the 
frequency  curve  formed  by  the  means  of  samples 
drawn  from  a non-normal  parent  population  is 
often  much  more  nearly  normal  than  the 
population  itself.  While  there  exist  several  types 
of  measurement  that  by  nature  do  not  have 
normal  parent  populations,  rarely  will  deductions 
based  on  the  normal  law  fail  to  be  valid. 

It  is  therefore  idle  to  investigate  whether  a 
parent  population  is  exactly  normal.  However,  it 
may  be  worth  while  to  discuss  some  arguments 
that  are  commonly-  advanced  as  proof  that  the 
normal  law  cannot  possibly  ever  be  obeyed.  The 
most  incisive  arguments  run  as  follows:  (a)  Since 
only  certain  discrete  values  can  be  recorded,  the 
probability  for  all  intermediate  values  is  zero. 
Therefore  the  law  of  error  cannot  be  continuous, 
hence  cannot  be  the  normal  curve,  because  of  the 
inherent  discontinuous  nature  of  measurement, 
(b)  The  frequency  polygon  of  a set  of  measure- 
ments is  nearly  always  skew  and  irregular, 
whereas  a symmetrical  and  regular  figure  should 
be  obtained  if  the  normal  law  holds,  (c)  Ex- 
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tremely  large  residuals  apparently  do  not  occur, 
whereas  according  to  the  normal  law  they  should 
occur  once  in  a while.  When  the  statistical  view  is 
taken  and  the  normal  curve  becomes  a frequency 
curve  for  the  parent  population  of  observations, 
the  fallacies  in  these  objections  become  evident, 
as  will  now  be  explained. 

The  discontinuous  nature  of  measurement  has 
nothing  to  do  with  the  law  of  error,  which  is  the 
specification  of  the  parent  population.  The  step 
or  least  count  of  the  instrument,  being  finite, 
simply  has  the  effect  of  grouping  the  observations 
into  class  intervals.  Such  grouping  must  always 
be  accomplished  before  a frequency  polygon  can 
be  constructed : if  the  instrument  did  not  attend 
to  this,  the  computer  would  have  to  do  it. 

It  might  be  expected  that  the  moments  of  a set 
of  n measurements  would  vary  somewhat  as  the 
least  count  and  the  zero  of  the  measuring  scale 
are  changed,  and  such  is  in  fact  the  case.  This 
effect  has  been  carefully  investigated  by  Shep- 
pard,^ Fisher,^  and  Wilson;^  and  the  corrections 
to  be  applied  to  the  various  moments  on  account 
of  the  finite  width  of  the  class  interval  have 
properly  come  to  be  known  as  “Sheppard’s 
corrections.”  These  serve  to  bridge  the  gap 
between  a continuous  law  of  error  and  the 
discontinuous  nature  of  measurement.  Such 
investigations  have  served  to  show  that  the  least 
count  of  the  instrument  should  be  small  enough 
so  that  when  a large  number  of  readings  (perhaps 
a hundred  or  more)  are  taken,  there  will  be  a 
variation  in  the  recorded  terminal  digits  of 
around  20  units,  for  otherwise  a considerable 
portion  of  a set  of  observations  is,  in  effect, 
scrapped.  An  astonishingly  large  number  of 
observations  may  be  required  to  overcome  the 
damage  done  by  unnecessarily  coarse  reading  or 
graduation  of  the  scale. 

The  appearance  of  a frequency  polygon  can  be 
very  misleading.  Even  when  there  are  many 
hundred  observations  in  a set,  the  appearance  of 
the  polygon  may  be  of  little  value  for  inferring 
the  law  of  error.  Fortunately  the  adequacy  of  a 
chosen  parent  population,  whatever  it  may  be 
and  however  arrived  at,  can  be  tested  quanti- 

^ W.  F.  Sheppard,  Proc.  London  Math.  Soc.  29,  353-380 
(1897);  J.  Roy.  Stat.  .Soc.  60,  698-703  (1897). 

^ R.  A.  Fisher,  Phil.  Trans.  Roy.  Soc.  A222,  309-368 
(1921-22). 

* E.  B.  Wilson,  Proc.  Nat.  Acad.  Sci.  13,  151-156  (1927). 


tatively  and  objectively  by  Karl  Pearson’s  chi- 
test  or  criterion  for  goodness  pf  fit.®  This  test 
determines  the  probability  that  a given  set  of 
observations  follows  the  normal  law  or  some 
other  proposed  form.  The  chi-test  provides  the 
only  decisive  criterion,  yet  it  is  almost  never  used 
by  physicists.  One  good  reason  is  that  at  least  500 
observations  are  required  in  order  that  confidence 
may  be  placed  in  the  result.^  Even  when  the  test 
shows  a small  probability  that  the  set  of  observa- 
tions came  from  a normal  parent  population, 
conclusions  based  on  the  normal  law  will  usually 
be  safe. 

If  the  least  count  of  the  instrument  were 
infinitesimal,  the  normal  law  would  admit  the 
occurrence  of  a certain  small  proportion  of  very 
large  residuals.  But  in  practice  the  least  count  is 
always  finite,  and  this  serves  to  divide  the  area 
under  the  frequency  curve  into  rectangular  strips 
every  one  having  width  equal  to  the  least  count, 
and  the  one  of  maximum  height  being  centered  at 
the  mean  of  the  curve.  The  readings  that  can  be 
made  on  the  instrument  are  the  abscissas  of  the 
centers  of  these  strips,  and  if  an  infinite  number 
of  readings  were  taken,  the  number  recorded  of  a 
particular  magnitude  would  be  the  area  of  the 
corresponding  strip.  Now  where  the  curve 
approaches  the  horizontal  axis,  the  areas  of  the 
successive  strips  decrease  very  rapidly  because  of 
the  infinitely  high  order  of  contact  made  by  the' 
curve.  This  will  especially  be  true  if  the  gradua- 
tions on  the  scale  are  coarse,  for  unless  the  least 
count  is  extremely  fine  there  will  always  be  some 
outlying  strip  whose  area  is  much  greater  than 
all  the  area  lying  beyond.  The  abscissa  of  the 
center  of  this  strip  will  then,  in  the  long  run,  be 


® Karl  Pearson,  Phil.  Mag.  50,  157-175  (1900).  This  was 
Pearson’s  first  paper  on  the  chi-test.  Tables  for  using  the 
criterion  were  computed  by  W.  Palin  Elderton,  and 
appeared  first  in  Biometrika  1,  155-163  (1901-02).  These, 
with  additions  and  examples,  are  found  in  Tables  for 
Statisticians  and  Biometricians,  Part  I,  edited  by  Karl 
Pearson  and  published  in  1914  by  the  Biometric  Labora- 
tory, University  College,  London,  W.  C.  1.  Some  important 
discussions  of  the  chi-test  are  summarized  by  R.  A.  Fisher 
in  his  Statistical  Methods  for  Research  Workers  (published 
by  Oliver  and  Boyd,  1925,  4th  edition,  1932), 

^ It  is  interesting  to  notice  the  frequency  polygon  for 
500  measurements  of  a spectral  line  made  by  one  of  us 
(reference  1,  p.  210).  The  chi-test  gives  P = 0.22,  which 
means  that  in  about  1 out  of  5 trials  we  should  expect  in 
random  sampling  a larger  than  that  here  obtained  if  the 
real  distribution  is  normal.  This  probability  is  not  only 
high,  but  is  a result  that  could  never  have  been  deduced 
from  the  mere  appearance  of  the  polygon. 
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recorded  more  frequently  than  all  the  further 
outlying  readings  combined,  which  means  that  in 
practice  the  residuals  apparently  have  an  upper 
limit.  Extremely  large  residuals  will  occur  once  in 
a while,  but  their  frequency  is  much  diminished 
by  the  discontinuity  of  measurement  and  the 
shape  of  the  normal  curve.  The  fact  that 
extremely  large  residuals  are  seldom’  found 
supports  the  normal  law  and  does  not  subvert  it. 
As  was  pointed  out  by  Pearson®  in  his  original 
paper  on  the  chi-test,  and  as  has  been  clearly 
explained  by  all  later  writers  on  the  same  subject, 
it  is  necessary  to  lump  the  tail  of  a frequency 
curve  into  a single  “cell”;  consequently  slight 
disagreements  between  calculated  and  observed 
frequencies  in  the  tails  of  the  curve  are  of  no 
concern  whatever,  either  in  making  an  objective 
test  (such  as  the  chi-test)  of  the  fit  of  the  curve  or 
in  speculations  on  the  extent  to  which  departures 
from  normality  may  invalidate  deductions  that 
are  based  on  a normal  parent  population.  Thus 
the  last  argument  is  found  to  be  irrelevant. 

§3.  The  Distribution  of  Certain  Properties 
OF  Samples  Drawn  from  a Normal 

Parent  Population,  and 
Some  Deductions 

(3a).  The  distribution  of  u,  s,  and  z 

The  normal  curve®  is  fully  specified  by  its 
mean  ix  and  S.D.  a.  If  xi,  X2,  •••,  Xn  are  n 
observations  of  equal  reliability  and  x is  their 
arithmetic  mean,  the  n true  errors  are  defined  as 
Xi  — At  = 6i  and  the  n residuals  as  x,— x=y,.  By 
definition,  the  S.D.  of  the  parent  population  is  a, 
where 

(T®  = 2j(Xi  — ju)V»'  = Z!eiV''-  (1) 

V V 

® The  normal  curve  is  sometimes  called  a Gaussian  error 
curve.  It  has  been  attributed  to  Gauss  rather  than  to 
Laplace  solely  because  Gauss’  Theoria  Motus  Corporum 
Coeleslium  appeared  in  1809,  three  years  prior  to  the 
appearance  of  Laplace’s  Theorie  Analitique  des  Proba- 
bililes.  But  this  was  not  Laplace’s  first  treatment  of  the 
normal  curve;  in  1774  (Memoires  . . . presentes  a I’Aca- 
demie  T.  vi,  p.  628)  he  arrived  at  the  normal  curve  as  an 
approximation  to  the  hypergebmetric  series,  and  in  1778 
(Memoire  sur  les  Probabilitcs)  he  dealt  further  with  it  and 
emphasized  the  need  of  tabulating  the  normal  probability 
integral.  Accordingly  Laplace  should  be  credited  with  the 
normal  curve  and  its  integrals  rather  than  Gauss.  How- 
ever, both  men  were  considerably  antedated  by  Abraham 
De  Moivre,  according  to  evidence  presented  in  a historical 
note  by  Karl  Pearson,  Biometrika  16,  402-404  (1924). 
De  Moivre  arrived  at  the  normal  curve  and  its  integrals 


The  algebraic  form  of  the  normal  curve  is® 

yd6  = C«'/(rV(27r)>-‘“/®‘’“d€.  (2) 

The  total  area  under  the  curve  is  v,  the  number  of 
observations  (and  hence  errors)  in  the  parent 
population. 

The  “probable  error”  of  a single  one— -any  one 
— of  the  observations  is  that  constant  quantity  p 
that  divides  the  area  of  the  curve  into  quarters. 
It  is  therefore  defined  by  the  equation 


yde  = l\  yde  = \v,  (3) 

> ^ — CO 


wherein  y has  the  value  assigned  by  Eq.  (2).  The 
value  of  p is  found  to  be  an  irrational  fractional 
multiple  of  <r,  namely, 

p = 0.6744897502---(t.  (4) 

It  is  an  even  bet  that  any  one  of  the  v observa- 
tions taken  at  random  lies  within  p±p,  for  half  of 
them  lie  jnside  p±p  and  the  other  half  outside. 
Curve  (a)  in  Fig.  1 shows  a normal  frequency 
curve  and  the  abscissas  that  divide  it  sym- 
metrically into  quarters. 

"^he  division  of  a symmetrical  curve  into 
quarters  is  called  a “quartile”  division,  and  the 
distance  from  the  center  to  the  dividing  lines  on 
either  side  is  known  as  the  “quartile  distance.” 
In  the  normal  curve  (a)  of  Fig.  1,  the  probable 


as  approximations  to  binomial  series  in  about  1721,  and 
printed  his  findings  under  the  title  Approximalio  ad 
Summam  Terminorunt  Binomii  {a-\-b)'^  in  Serietn  expansi, 
dated  Nov.  12,  1733.  This  seven  page  pamphlet  was 
bound  into  the  unsold  copies  of  his  Miscellatiea  Analylica 
as  a second  supplement.  Only  two  copies  of  this  book  com- 
plete with  the  second  supplement  have  been  reported 
extant,  but  these  rare  pages  have  been  made  generally 
accessible  by  a photographic  reproduction  in  a com- 
mentary by  R.  C.  Archibald,  Isis  8,  671— 683  (1926). 
De  Moivre  himself  translated  the  Approximalio  . . . into 
English  and  amplified  it  for  portions  of  the  second  and 
third  editions  of  his  Doctrine  of  Chances,  published  in 
1738  and  1756,  respectively.  This  English  translation  is 
quoted  in  full  on  pages  567-575  of  David  Eugene  Smith’s 
A Source  Book  in  Mathematics  (McGraw-Hill,  1929). 
The  essential  parts  ot  this  translation  are  found  on  pages 
14-17  of  Helen  M.  Walker’s  History  of  Statistical  Method 
(Williams  and  Wilkins,  Baltimore,  1929). 

® In  this  paper,  frequency  curves  will  be  written  in 
differential  form,  y will  be  used  indiscriminately  for  the 
ordinates  of  all  of  them.  The  differential  specifies  what 
sort  of  frequency  curve  y is  the  ordinate  of,  and  the  whole 
expression  gives  the  frequency  in  the  elementary  cell. 
Thus  in  Eq.  (2),  y de  is  the  number  of  errors  in  the  interval 
edhjrfe. 
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Fig.  ] . (a)  The  normal  freciuency  curve  of  errors  in  the 
parent  population;  its  S.D.,  or  square  root  of  the  second 
moment  about  the  mean,  is  <r.  The  area  under  the  curve  is 
the  total  number  of  errors,  v.  The  abscissas  ±p  = 0.674  • • -cr 
di\  ide  the  curve  symmetri  ally  into  quarters,  (b)  The 
frerpiency  curve  of  the  errors  of  the  means  of  N samples 
of  6 each,  drawn  at  random  from  the  preceding  parent 
population  of  errors.  This  curve  is  also  normal,  but  its  S.D. 
is  (t/V6,  hence  the  abscissas  that  divide  it  into  quarters 
are  ±r  = 0.674-  • •a/^J  6.  The  area  under  the  curve  is  N,  the 
number  of  samples. 

error  p is  therefore  the  quartile  distance  of  the  i> 
observations  from  the  mean  p. 

The  S.D.  s of  the  sample  of  n observations  is  by 
definition  the  r.m.s.  residual,  so 


p X Xi 


f.— 

1.'  : 
* 

Fig.  2.  An  observation  falls  at  x,  and  the  average  of 
a sample  of  n falls  at  x.  The  figure  shows  the  relations 
between  the  error  e;,  the  residual  Vi,  and  the  error  u of 
the  .sample.  Here  ix<x<Xi,  hence  a,  vi,  and  u are  oositive, 
as  the  arrows  indicate. 


discussing  the  precision  of  Bessel’s  correction,  he 
had  occasion  to  find  the  distribution  of  5 in 
samples  of 

Eq.  (2)  gives  the  number  of  errors  in  the  parent 
population  lying  in  e±|de;  whence  the  proba- 
bility of  the  coexistence  of  n errors  lying  in  the 
ranges  tjzh^dei  (f  = 1,  2,  • • • , w)  is 


[1/ (tV  (27t)]'‘  exp  dti- 

1 


(7) 


This  can  be  expressed  in  terms  of  u and  s by- 
noting  the  relations  between  errors  and  residuals 
that  are  exhibited  in  Fig.  2 and  expressed 
algebraically  by 


= + M 

€2  = Z^2  + M 


^71 — 1 n — X -f-  1i 


= '^{Xi  — xY  = In. 


(5) 


The  true  error  of  the  mean  of  the  sample  will  be 


These  follow  directly  from  the  definitions.  Since 
the  algebraic  sum  of  the  residuals  is  zero,  it  is 
evident  that 


.u  = x — ix.  (6) 

5 and  X can  always  be  computed,  but  u is  un- 
known as  long  as  p remains  unknown. 

Our  study  of  the  theory  of  errors  depends 
mainly  on  the  distribution  of  u and  s in  samples 
of  n drawn  from  the  parent  population.  This  was 
first  found  by  Helmert  in  three  neglected  papers 
that  appeared  in  1875  and  1876.  He  found  first  an 

n 

expression  for  the  distribution  of  in  a set  of 

1 

n measurements.*®  The  following  year,  1876,  in 

F.  R.  Helmert,  Schlomlich’s  Zeits.  f.  Math,  und  Phys. 
20,  .300- -303  (1875):  ibid.  21,  192-218  (1876).  Helmert’s 
derivation  i.s  reproduced  in  Emanuel  Czuber’s  Beobachlungs- 
Jehler  (Tcubner  (1891))  on  pages  147-150. 


= Y^v^-{-nu^  = ns^-\-'nu'^.  (9) 

1 1 

This  resembles  the  formula  for  the  moment  of 
inertia  of  n points  of  equal  mass  about  p.  5 is  the 
radius  of  gyration  about  x,  and  u is  the  distance 
from  p to  X. 

The  Jacobian  of  the  transformation  (8)  is  n,  so 
that  dt\  dti-  • - den  becomes  n du  dv\  dv-i-  • ’dvn-i', 
whence  the  probability  of  the  coexistence  of  the  n 
residuals  Vi,  • • • , is 

" F.  R.  Helmert,  Astronomische  Nachrichten  88,  No. 
2096,  122  (1876).  This  is  given  in  Czuber’s  book  on  pages 
159-163.  References  to  Helmert’s  work  are  often  in- 
accurately given. 
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y du  dvi  dv2'  • •dVn-l  = n{_l/(^^|  (2ir)]"  exp  ( — — nu^/2a^)du  dvi  dv2-  • -dv^- 


(10) 


By  a clever  transformation,  Helmert  changed  the  element  of  volume  from  du  dvi  dv2  - • -dvn-i  in  the 
residual  space  to  the  element  du  ds  in  the  u,s  space.  A shorter  method  than  Helmert’s  is  the  geo- 
metrical one  introduced  by  Karl  Pearson,®  which  for  brevity  we  shall  follow.  Since  the  integral  of  the 
right-hand  side  of  Eq.  (10)  over  all  values  of  u,  V\,  V2,  • • • , t’n-i  is  convergent,  integration  with  respect 
to  Vi,  V2,  • • • , Vn-i  can  be  accomplished  by  using  an  ellipsoidal  shell  in  the  orthogonal  V\,  • • • , Vn-\ 

space  in  place  of  the  rectangular  element  dvi  dv2  - • -dvn-i.  The  volume  of  the  thin  ellipsoidal  shell 
defined  by  the  two  surfaces  over  which  has  the  pair  of  constant  values  n^{s±\ds) 

^ ds — a result  that  is  known  from  studies  in  hyper-space.  Now  since 

the  right-hand  side  of  Eq.  (10)  up  to  the  differentials  is  constant  over  either  surface  of  the  shell  that 
has  just  been  described,  it  can  be  integrated  throughout  this  shell  simply  by  replacing  dvi  dv2  - • -dVn-i 

b 


by|^27T^^" 


-2)  gn-2  Multiplication  by  N,  which  denotes  an  indefinitely  large 


number  of  samples  of  n drawn  at  random  from  the  same  parent  population,  will  then  give 
y du  ds  = Nn\\/ (jyj  (27r)]”{27r^^"~ib'^P[Kw  — exp  { — ns"^ ! — nu^ / 2a‘‘)du  ds 


=[; 


iVV  n 
V (2ir) 


_g~nu^l2c^  dw 


T— - 

lrri(w- 


i(n— 1) 


r[Hw-i)]2P"“®v 


(11.) 


for  the  frequency  distribution  of  u and  5.  y du  ds 
is  the  number  of  samples  that  have  S.D.  in  the 
interval  and  means  in  the  interval  u±^du 

measured  from  the  mean  u of  the  parent  popula- 
tion. n is  the  number  of  observations  in  each 
sample,  and  N is  the  number  of  samples. 

Eq.  (11)  is  a very  important  one.  In  the  first 
place,  by  integrating  it  with  respect  to  5 from  0 to 
oo  there  results 

ydu^[_N^|n/a^|  (27t)>-"“'«.2  (12) 


accordingly  pl^n\  that  is^® 

f = 0.674>  • ~o-/V«.  (13) 

A frequency  curve  for  the  means  of  samples  of  6 
is  given  as  Curve  (b)  of  Fig.  1.  The  vertical  lines 
with  abscissas  ±r  divide  its  area  symmetrically 
into  quarters.  It  is  an  even  bet  that  the  mean  of  n 
observations  does  not  differ  from  the  mean  of  the 
parent  population  by  more  than  r. 

In  the  second  place,  integration  of  u from  — oo 
to  -f-oo  in  Eq.  (11)  gives 


for  the  number  of  means  having  errors  in  the 
interval  u±^du.  Eq.  (12)  is  another  normal 
curve,  and  its  S.D.  is  <7/ V n.  This  is  an  important 
property  of  samples  from  a normal  parent 
population.  The  probable  error  r (or  the  quartile 
distance)  of  the  mean  of  n Observations  is 


Instead  of  using  the  actual  volume  of  the  ellipsoidal 
shell,  it  is  perhaps  more  convenient  simply  to  say  that 
the  volume  contained  between  the  two  ellipsoidal  surfaces 
must  be  some  constant  times  s"~^ds,  since  it  is  in  a space  of 
n — 1 dimensions.  Then  from  Eq.  (10) 

y dw  (fs  = const.  s’‘~^  exp  { — ns^l2<T^—nu-l2a'^)  duds 

will  be  the  frequency  distribution  of  u and  5 if  the  factor 
of  proportionality  is  properlj^ch^en.  This  factor  can  be 

found  by  equating  (1/iV)  / | y du  ds  to  unity;  its 

value  so  determined  and  inserted  back  into  the  expression 
for  y du  ds  gives  Eq.  (11)  immediately. 


y ds  = 


r[K«-i)]2i''‘-®v 


0' 


(14) 


for  the  number  of  samples  having  S.D.  lying  in 
the  interval  sdb^ds  and  with  x lying  anywhere. 
This  is  equivalent  to  a result  obtained  by 
Helmert^  in  1876,  and  for  this  reason  it  will  be 
called  “Helmert’s  equation.”  A graph  for  « = 6 is 
shown  as  Fig.  3.  Karl  Pearson*'*  has  discussed  the 


A table  showing  the  factor  0.674- .-/Vw  to  five 
figures,  for  n running  from  1 to  1000,  was  published  by 
Winifred  Gibson,  Biometrika  4,  385-393  (1906).  This  is 
reproduced  as  Table  V in  the  Tables  for  Statisticians  and 
Biometricians,  Part  I.  Table  26  in  the  Smithsonian  Physical 
Tables  shows  0.674- • -/V  («  — 1)  to  four  figures  up  to 
n = 99,  whence  the  factor  0.674- --/Vw  can  be  read  if 
one  takes  care  to  increase  the  argument  by  unity. 

Karl  Pearson,  Biometrika  10,  522-529  (1915). 
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Fig.  3.  Frequency  distribution  of  the  standard  deviation 
5 in  samples  of  6 from  a parent  population  whose  standard 
de\  iation  is  a. 


y ds  = - 


2l( 


e m = 6 


iVis  the  number  of  samples;  n is  the  number  in  each  sample 
and  is  here  equal  to  6.  The  mode  comes  at  V<r  = 0.S165. 
The  median  comes  at  i/«r  = 0.8516.  The  mean  comes  at 
i/cr  = 0.8686. 


geometry  of  these  curves.  They  are  decidedly 
skew  when  w is  small,  but  as  n increases  they 
become  normal  about  the  point  s — a with  S.D. 
0-/V  (2m),  as  Pearson  showed  analytically  and  as 
is  exhibited  graphically  in  Fig.  4.  Each  full  line 
curve  is  the  true  graph  of  Helmert’s  equation, 
while  the  corresponding  broken  line  is  a normal 
curve  of  S.D.  cr/V  (2m)  so  placed  that  its  center 
(peak)  comes  at  the  mean  s of  the  full  line  curve. 
The  approaching  coincidence  of  the  full  and 
broken  curves  with  increasing  n shows  hdw 
Helmert's  curves  lose  their  skewness  and  become 
normal  with  S.D.  a 1^1  (In). 

The  mode  (maximum)  is  at 

5 = <r[(M  — 2)/m]’— >cr(l  — Xfn  — XIln^-—  • • •).  (15) 

The  mean  (first  moment  of  area)  is  at 

5=1  5 y (Is/  I y ds 

= cr(27r/M)V5[K«-l),  |]-^ct(1-3/4m 

-7/32n^ ).  (16) 


Table  I.  The  median  <rlf  of  the  standard  deviation  frequency 
curves,  a/f  is  defined  by 


.-1, 

“ (19) 

and  mode. 

i 

Comparison 

with  the  mean 

Median 

Mode 

Mean 

<r(2?r/)l)b-S^— 

n 

a// 

aV[(«-2)/«] 

2 

0.476  9363  ff 

0 

0.564  1896 

3 

.679  7782 

0.577  3503  <r 

.723  6012 

4 

.769  0862 

.707  1068 

.797  8846 

5 

.819  3527 

.774  5967 

.840  7487 

6 

,851  6120 

.816  4966 

.868  6267 

7 

.874  0808 

.845  1543 

.888  2029 

8 

.890  6326 

.866  0254 

.902  7033 

9 

.903  3347 

.881  9171 

.913  8749 

10 

.913  3911 

.894  4272 

.922  7456 

11 

.921  5509 

.904  5340 

.929  9598 

12 

.928  3048 

.912  8709 

.935  9418 

13 

.933  9874 

.919  8662 

.940  9825 

14 

.938  8347 

.925  8201 

.945  2877 

15 

.943  0191 

.930  9493 

.949  0076 

16 

.946  6671 

.935  4143 

.952  2538 

17 

.949  8761 

.939  3364 

.955  1115 

18 

.952  7207 

.942  8090 

.957  6464 

19 

.955  2598 

.945  9053 

.959  9103 

20 

.957  5399 

.948  6833 

.961  9445 

21 

.959  5989 

.951  1897 

.963  7823 

22 

.961  4675 

.953  4626 

.965  4507 

23 

.963  1706 

.955  5331 

.966  9721 

24 

.964  7297 

.957  4271 

.968  3652 

25 

.966  1620 

.959  1663 

.969  6456 

49 

.982  8634 

.979  3792 

.984  6022 

75 

.988  8337 

.986  5766 

.989  9609 

to 

the  factorials 

that  arise 

from  the  beta 

function. 

The  median  s of  one  of  these  curves  is  the 
abscissa  that  divides  its  area  into  halves.  This 
abscissa  will  be  some  multiple  of  c,  say  o-//, 
which  by  definition  will  satisfy 


wherein  the  integrand  is  given  by  Eq.  (14).  The 


r(|M) 


-)r(i) 

1 n — 2 «— 4 n— 6 

6-4-2  ^ \ 2 7 

IT  n — 3 n—5  n — 7 

5-3-1  ir(re-2)! 

1 n—2 n—i n—6 

5-3-1  (re-2)! 

2 re  — 3 re— 5 re  — 7 

4-2  2’‘-V«-3 

V 2 7 

n even 
n odd 


The  last  parenthesis  comes  from  applying  the  De 
Moivre — Stirling  approximation 

m!  = (27rM)HM/e)”(l  + l/12M 

+ 1/288m2- 139/5  1840m3+---)  (17) 


These  products  can  be  derived  from  the  recursion  formula 
r(n  + l)=n  T(n),  which  leads  to 

(«— 2)!V5T  1 

= « even  I 

= [5(n— 3)T  n odd  J 

since  Ffs)  = V ir. 
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upper  limit  can  be  found  by  inverse  interpolation 
in  the  Tables  of  the  Incomplete  Gamma  Function}^ 
These  calculations  have  been  made  for  us  by 
Lola  S.  Deming,  and  are  given  in  Table  I, 
together  with  the  abscissas  of  the  mean  and 
mode.  The  positions  of  the  mean,  median,  and 
mode  are  shown  graphically  for.  w = 6 in  Fig.  3. 
Clearly,  as  n increases,  the  mean,  median,  and 
mode  all  approach  the  value  <r,  as  is  already 
evident  from  the  discussion  of  Fig.  4. 

In  the  third  place,  it  is  evident  that  the 
distributions  of  u and  s in  Eq.  (11)  are  completely 
independent;  in  any  sample  u may  be  large  and  s 
small,  and  conversely.  This  is  the  fundamental 
reason  for  the  difficulties  that  are  encountered  in 
attempting  to  find  the  true  mean  n and  the 
probable  error  of  x when  the  only  information  at 
hand  is  that  provided  by  the  sample  itself. 
These  difficulties  disappear  as  n increases,  as 
will  be  clear  from  a later  section.  The  inde- 
pendence of  u and  5 is  a property  peculiar  to 
samples  drawn  from  a normal  parent  population. 
This  property  does  not  hold  for  a non-normal 
distribution. 

The  fourth  result  to  be  derived  from  the 
simultaneous  distribution  of  u and  5 is  the 
distribution  of  u/s,  with  s lying  anywhere 
between  0 and  ufs  can  be  thought  of  as  the 
distance  from  the  mean  of  the  sample  to  the 
mean  of  the  parent  population  measured  in 
terms  of  the  S.D.  of  the  sample.  The  distribution 
of  u/s  was  first  found  by  Student^^  in  1908.  To 
accomplish  this  he  needed  the  distribution  of  s. 
Unaware  of  Helmert’s  work.  Student  established 
the  distribution  of  5 beyond  reasonable  doubt  by 
an  ingenious  empirical  process.  Then  after 
proving  that  there  is  no  correlation  between  u 
and  5,  nor  between  and  he  assumed  that  u 
and  5 are  independent,  and  proceeded  by  the 
following  method  to  find  the  distribution  of  u/s. 


Tables  of  the  Incomplete  Gamma  Function,  edited  by 
Karl  Pearson,  published  by  His  Majesty’s  Stationery 
Office,  Imperial  House,  Kingsway,  London  W.  C.  2.  (1922). 
The  incomplete  gamma  function  is  defined  by  the  integral 


r.(»)-X' 


X"  ^ix. 


In  the  same  symbolism  the  complete  gamma  function 
would  be  but  for  brevity  and  by  convention  we 

drop  the  subscript  « and  write  simply  r(»).  The  left-hand 

side  of  Eq.  (18)  is  N 2 j where  t;  = w/2F. 

‘’Student,  Biometrika  6,  1-25  (1908);  11,  414-417 
(1915-17). 


Fig.  4.  Frequency  distribution  of  the  standard  deviation 
j in  samples  of  n from  a parent  population  whose  .standard 
deviation  is  a. 


yds  = — p- (-)  e "A-ff 


(Is 


r 

-y  ds  = ^ g-n(s-i)2/ff2 

crV  K 


N is  the  number  of  samples.  5/cr  is  the  abscissa  of  the 
center  of  area  for  a particular  full  line  curve.  These  curves 
illustrate  the  mode  approaching  the  mean  and  the  fre- 
quency distribution  of  i becoming  normal  with  standard 
deviation  al{2n)\  as  n increases. 


In  Eq.  (11)  let  u/s  be  replaced  by  z.  Then  if  s 
and  z be  used  as  orthogonal  axes  in  place  of  u and 
s,  the  elementary  volume  y du  ds  becomes 
y s ds  dz,  so  that  the  simultaneous  distribution  of 
5 and  z is 

N 

y ds  dz  — 

V (2x)r[-|(w  — 1)]2’*'‘~“V- 

• (5,/(r)’-’^  ^20) 


Integration  of  this  with  respect  to  5 from  0 to 
00  gives 


N 

y dz  — (l+Z“)~‘"t/s 


(21) 
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Normal  distribution  of  S.D.  1/ V ni, 

y dz  = N-^  {tn dz,  m = «— 1,  n — 2,  n — 3. 

The  curves  are  plotted  for  n = 5.  The  normal  curve  of 
S.D.  1/(w-3/2)5  is  not  shown  because  it  lies  so  close  to 
Student’s  distribution  that  it  would  cause  confusion. 
The  area  under  all  curves  is  N,  the  number  of  samples, 
n is  the  number  in  each  sample,  z is  the  distance  from  the 
mean  of  the  sample  to  the  mean  of  the  parent  population 
(the  true  value),  measured  in  terms  of  the  S.D.  i of  the 
sample. 


l/(«  — 3/2)^  will  fall  very  close  to  Student’s 
distribution,  especially  near  the  center;  in  fact 
such  a curve  could  not  be  shown  on  the  same 
figure  without  confusion  and  so  has  been 
omitted.  The  agreement  in  the  quartile  distances 
of  the  two  curves  is  shown  in  Table  II,  which  will 
be  needed  later., 

(3b).  The  u,s  frequency  surface 

The  simultaneous  distribution  of  u and  5 is 
important  not  only  for  the  four  conclusions  that 
have  already  been  deduced  from  it,  but  also 
because  it  is  the  equation  of  the  “u,s  frequency 
surface” — a surface  whose  altitude  y on  the 
orthogonal  axes  u and  5 is  given  by  Eq.  (11).  The 
elementary  volume  y du  ds  is  the  number  of 
samples  whose  errors  fall  in  the  range  u±\du 
while  their  standard  deviations  fall  in  the  range 
5±-2<fi';  consequently,  by  integration,  the  volume 
erected  on  any  closed  figure  in  the  u,s  plane  is  the 
number  of  samples  whose  errors  and  standard 
deviations  fall  simultaneously  within  the  ranges 
defined  by  the  boundary  of  the  given  figure.  The 
total  volume  under  the  surface  is  N,  the  number 
of  samples.  The  authors  have  found  this  surface 
to  be  extremely  valuable  in  describing  certain 
properties  of  small. samples. 

Because  of  the  complete  independence  of  ii 
and  s,  all  plane  sections  u = const,  on  this  surface 
will  be  skew  curves  similar  to  the  curve  defined 


for  the  number  of  samples  having  z in  the  range 
z±\dz  and  any  S.D.  s whatever.  This  is  called 
“Student’s  distribution.”  The  most  important 
property  of  this  equation  is  the  absence  of  a. 
Student’s  1908  paper  was  a powerful  stimulus  to 
the  theory  of  sampling,  not  alone  for  the  distri- 
bution oi  u/s  but  for  the  distribution  of  s itself, 
since  not  until  long  afterward  was  Helmert’s 
prior  work  discovered  by  statisticians.^® 
Student’s  curves  are  symmetrical  in  z,  as 
would  be  expected,  since  for  any  value  of  s,  u 
is  as  likely  to  be  positive  as  negative.  As  n 
increases  they  become  normal  near  the  center, 
with  S.D.  l/(»  — 3/2)h  The  full  line  curve  in  Fig. 
5 is  Student’s  distribution  for  w = 5.  The  dashed 
ones  are  the  normal  curves  of  S.D.  l/(w  — 1)^ 
1/(m  — 2)%  and  l/(«  — 3)^  for  comparison.  The 
figure  shows  that  a normal  curve  of  S.D. 

Karl  Pearson,  Biometrika  23,  416-418  (1931-32). 


Fig.  6.  The  frequency  surface 


' du  ds=\  '■  e 

L<rV(2ir)  J 


illustrated  by  sections.  As  n increases,  the  volume  becomes 
more  and  more  concentrated  about  the  point  m = 0,  s = <r. 
The  total  volume  is  always  N,  the  number  of  samples. 
The  j = const,  curt^es  are  always  normal  with  S.D.  =(t/-^  n. 
The  M = const,  curves  approximate  normal  curves  with 
S.D,  =<r/V  (2»)  as  n increases  sufficiently. 
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by  Helmert’s  equation,  which  has  already  been 
discussed.  Fig.  3 is  then  typical  of  any  of  these 
curves.  They  will  all  have  the  same  mode,  mean, 
and  median  that  have  been  found  in  Eqs.  (15), 
(16),  (18),  for  Helmert’s  equation.  As  n increases, 
the  mode,  mean,  and  median  approach  coinci- 
dence with  the  value  c while  the  curves  lose 
their  skewness  and  become  normal  with  center  at 
5 = <r  and  with  S.D.  c/V  (2ra). 

The  5 = const,  curves  will  be  normal,  all  with 
center  at  = 0 and  with  S.D.  o-/ V n.  Clearly,  as  n 
increases,  the  u,s  frequency  surface  becomes 
more  and  more  concentrated  about  the  point 
M = 0,  s = a.  Two  u,s  frequency  surfaces  are 
represented  by  sections  in  Fig.  6.  The  one  on  the 
left  is  for  a small  value  of  n and  the  one  on  the 
right  is  for  a comparatively  large  value  of  n. 

(3c).  Tests  for  hypotheses  concerning  the  parent 
population 

Since  the  frequency  surface  for  a normal 
parent  population  is  completely  determined  when 
its  mean  and  S.D.  a are  given,  it  is  sufficient 
in  our  problem  to  state  that  the  object  of  making 
n observations  is  *to  enable  something  to  be 
conjectured  regarding  the  mean  or  the  S.D.,  or 
both,  of  the  hypothetical  indefinitely  large 
number  of  observations  that  might  be  taken  and 
from  which  the  n observations  constitute  a 
sample.  By'  keeping  in  mind  the  u,s  frequency 
surface  it  is  possible  to  make  certain  objective 
statements  regarding  the  parent  population  from 
which  a sample  is  drawn. 

As  long  as  the  parent  population  remains 
unknown,  the  position  of  a sample  in  the  n,s 
plane  remains  unknown  so  far  as  its  u coordinate 
is  concerned.  The  S.D.  s and  the  mean  x can  be 
computed  for  the  sample,  but  the  error  u = x — n 
obviously  cannot  be  computed,  for  n is  unknown. 
Moreover,  on  account  of  the  independence  of  u 
and  s,  the  known  value  of  5 gives  us  no  clue 
regarding  the  value  of  u ; however,  it  may  help  us 
to  lay  odds  on  any  specified  range  within  which  ii 
might  be  found. 

Since  the  same  sample  can  come  from  many 
sources,  the  exact  parent  population  cannot  be 
determined  from  the  sample.  On  the  other  hand, 
considerations  of  the  u,s  frequency  surface  are 
often  very  helpful  in  deciding  whether  a sug- 
gested hypothesis  regarding  the  parent  popula- 


tion is  improbable.  To  be  more  specific,  there  are 
certain  tests  which  determine  the  probability 
that  the  given  sample  could  have  been  drawn 
from  a suggested  parent  population — that  is,  a 
parent  population  having  a proposed  mean  and 
S.D.  These  various  tests  will  not  all  give  the 
same  answer  to  the  problem,  in  fact  at  times  they 
may  differ  so  widely  that  a suggested  hypothesis 
will  be  accepted  on  the  basis  of  one  test  but 
rejected  by  another.  Such  a situation  is,  of  course, 
a difficult  one,  but  it  is  apt  to  arise  when  dealing 
with  small  samples.  The  larger  n is,  the  finer  will 
be  the  distinctions  that  can  be  drawn  between 
one  hypothesis  and  another,  and  the  closer  will 
all  tests  agree.  In  the  limit,  as  n becomes  infinite. 


A 8 


Fig.  7.  Contours  in  the  u,s  plane.  A sample  of  S.D.  ,r 
and  of  error  u can  be  plotted  in  the  u,s  plane.  The  sample 
point  («,i)  lies  on  the  four  contours  shown; 

|«l  =const.  |s|  = |«/i|  =const. 

5 = const.  X = const. 

the  sample  becomes  identical  with  the  parent 
population  and  any  proposed  hypothesis  can  be 
decided  with  certainty.  However,  n is  for 
various  reasons  usually  limited  to  a small  integer, 
and  the  problem  is  to  learn  how  much  can  be 
safely  inferred  from  such  a sample. 

By  proposing  values  of  n and  a,  a u coordinate 
for  the  sample  is  provided  for  testing  purposes, 
and  the  sample  may  be  placed  at  the  point 
(m,5)  in  Fig.  7,  and  certain  conclusions  drawn. 
The  volume  of  the  ii,s  frequency  surface  lying 
outside  any  one  of  certain  contours  that  pass 
through  the  point  furnishes  a test  of  the  hypothe- 
sis. 

Through  the  given  point  in  the  u,s  plane  there 
can  be  drawn  five  contours  that  divide  the 
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volume  symmetrically  each  side  of  the  ^ axis. 
They  are 

zhM  = const.,  (22) 

5 = const.,  (23) 

±s  = m/5  — const.,  (24) 

5 = exp  [ — |n(M--f5^)/o-"] 

= const.,  (25) 

\={s/(7)”  exp  [ — -2-«(m^  + 5^  — (7^)/(7^] 

= const.  (26) 

The  first  three  are  straight  lines  extending  to 


infinity;  the  last  two  are  oval  closed  curves 
surrounding  the  highest  point  (m  = 0,  s 
= a[_(n  — 2)/n'}^  of  the  volume  defined  by  Eq. 
(11).  Only  the  z contours  are  independent  of  <t. 

A certain  fraction  of  the  volume  lies  outside  the 
symmetrically  placed  u contours  AA  and  BB- 
this  fraction  is  the  probability  of  drawing  a 
sample  of  n items  having  an  absolute  error  in 
their  mean  greater  than  the  proposed  value  of  u. 
This  fractional  part  of  the  volume  can  be 
computed  easily  from  a table  of  the  normal 
probability  integral.  Its  value  is 


Pu  = 2- 


V n 


cr^I  (2tt)  a ^ 
V n 


^CO 

I 


,5(n-l) 


lOO  / \ n—2 


du- 


rC|(M-i)]25 


— ff-) 


= 2 e-"“'/2"'dM  = l-V(2/7r) 

crV  (27t)  Ju  *^o 


r^J  (2w) 


(27) 


If  Pu  turns  out  to  be  small,  say  0.01,  then  only  once  in  100  trials  could  the  mean  of  the  sample  be 
expected  to  differ  so  widely  from  the  mean  of  the  proposed  parent  population;  in  such  a case  the 
hypothesis  would  immediately  be  placed  under  suspicion,  but  it  cannot  be  definitely  rejected  until 
other  tests  have  been  made  and  the  circumstances  carefully  reviewed.  On  the  other  hand,  if  Pu  turns 
out  to  be  fairly  large,  say  0.2  or  higher,  then  in  at  least  1 trial  out  of  5 a greater  error  would,  in  the 
long  run,  be  obtained,  and  there  would  be  no  grounds  for  rejecting  the  hypothesis  on  this  criterion. 
The  test  just  described  will  be  called  the  “u  test.” 

The  upper  limit  in  the  last  integral  of  Eq.  (27)  is  the  ratio  of  u to  aj^n,  i.e.,  the  ratio  of  u to  the 
S.D.  of  the  means  of  samples  of  n.  In  this  form  the  value  of  is  easily  found  from  Sheppard's  Tahle}^ 
If  the  form  of  the  integral  in  Eq.  (27)  is  changed  so  that 

^u/V(2  <72/71) 

Pu=l-(2/^/ 7t)  e-‘^dt,  (28) 

•7o 

the  upper  limit  becomes  the  argument  in  various  other  tables  of  the  normal  probability  integral.^®  The 
upper  limit  could  also  be  made  to  depend  on  the  ratio  ujr  with  an  attending  increase  in  convenience 
for  some  problems;  thus. 


W.  F.  Sheppard,  Biometrika  2,  174-190  (1902).  This 
table  is  reproduced  as  Table  II  in  Tables  for  Slatisticians 
and  Biometricians,  Part  I.  The  upper  limit  in  the  integral 
of  Eq.  (27)  is  Sheppard’s  x,  and  our  Pu  is  his  1—a  or 

2[l-|(l+a)].  

20  The  first  table  of  the  normal  probability  integral  was 
computed  by  Kramp  and  published  in  his  Analyse 
des  Refractions,  pp.  195-206  (Strasbourg,  17/19).  This 
formed  the  basis  for  all  tables  down  to  1898,  when  James 
F.  Burgess  in  the  Trans.  Roy.  Soc.  Edinburgh  39,  Part  II, 
pp.  257-322  (1898)  tabulated  the  integral  in  Eq.  (28)  to 
15  decimals,  together  with  first  and  second  differences, 
the  argument  being  the  upper  limit  of  this  integral  and 
proceeding  in  steps  of  0.001  from  0 to  1.499  and  then  in 
steps  of  0,002  from  1.500  to  3.  Shorter  tables,  based  on 
Burgess’,  are  given  in  B.  O.  Peirce’s  A Short  Table  of 


Integrals  (Ginn  and  Company),  in  the  Smithsonian 
Physical  Tables  (pp.  56  and  57  of  the  1th  and  &th  editions), 
and  in  many  texts  on  the  theory  of  errors,  least  squares, 
and  statistics.  Notable  also  is  the  Kelley-Wood  table, 
Appendix  C of  Truman  L.  Kelley’s  Statistical  Method 
(Macmillan,  1924),  where  the  upper  limit  of  the  integral 
in  Eq.  (28)  is  tabulated  with  Kl~-Pu)  as  argument  in 
steps  of  0.001  from  0 to  0.499.  One  of  the  handiest  tables 
for  Pu  are  Tables  I and  II  in  R.  A.  Fisher’s  Statistical 
Methods  for  Research  Workers  (page  79  in  the  fourth 
edition),  where  u/{<r/-d  n)  is  listed  to  six  decimals  for  values 
of  Pu  proceeding  in  steps  of  0.01  from  Pu  = 0.01  to  Pu  = 1.00, 
and  also  for  P„=10“2^  10“^,  • ■ 10“®.  It  is  interesting  to 

note  that  the  “Diffusion  Integral’’  of  Table  31  in  the  7th 
and  8ih  editions  of  the  Smithsonian  Physical  Tables  is 
just  our  Pu. 
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J'tul  r ^ul  r 

e-Uo.m-y^t^dt=l-(2y/^Tr)  I e~y^‘^  dt 
0 ^0 

Jr%yul  r 

dt,  (29) 

D 

wherein  7 = 0.674- • •/V2  = 0.476936276- • The  probability  integral  was  first  tabled  with  u/r  as 
argument  by  Encke,^^  and  this  form  has  been  adopted  by  several  later  writers. 

The  necessity  of  using  tables  of  the  normal  probability  integral  is  to  a large  measure  obviated  by 
Fig.  8,  which  shows  closely  enough  for  most  purposes  the  chance  P„  of  the  occurrence  of  an  error  (in 
the  absolute  sense)  as  great  as  or  greater  than  given  multiples  both  of  the  S.D.  (or  r.m.s.  error)  cr/yj  n 
and  of  the  probable  error  r = 0.674-  • -o-fy/n. 

Tables  of  the  normal  probability  integral  generally  tabulate  the  fraternal  portion  of  the  area  under 
the  normal  curve,  that  is,  the  rarashaded  portion  of  the  area  in  the  upper  right-hand  corner  of  Fig.  8. 
This  internal  area  is  to  be  subtracted  from  the  whole  area  (unity)  to  obtain  the  external  portion, 
which  we  designate  by  P„.  The  reader  should  note  carefully  that  in  the  headings  of  some  tables,  the 
letter  P is  used  for  the  internal  portion  of  the  area,  and  is  then  just  the  complement  of  our  P„. 

The  other  contours  in  Fig.  7 provide  other  tests.  Thus,  the  fraction  P^  of  the  volume  that  lies  above 
the  s contour  EE  is  the  chance  of  drawing  a sample  of  ra  having  a S.D.  greater  than  s.  This  leads  to 
another  type  of  probability  integral,  the  incomplete  gamma  function,  which  has  been  tabled  by  Karl 
Pearson  and  his  staff.^®  From  Eq.  (11)  the  fraction  of  the  volume  above  EE  in  Fig.  7 is 


r Vw  r”  T 1 

Ps=  I I 

LcrV  (27t)  JLr[^(ra  — 1) \^ / J 


= 1 — 


raKn-l) 


r[K«-i)] 

'■ra  — 1 


ds 


(30) 


wherein  v = ns^l2(i^,  and  and  F represent  the 
incomplete  and  the  complete  gamma  functions.^® 
Here  it  should  be  noted  that  the  ratio  of  ^ to  <t  is 
required  in  order  that  this  integral  can  be  found, 
but  no  value  of  tx  is  needed.  If  P«  is  small,  there  is 
an  equally  small  chance  that  a sample  of  S.D.  as 
large  as  the  known  5 could  have  been  drawn  from 
a parent  population  having  the  suggested  S.D.  a, 


Encke,  Berliner  Astronomisches  Jahrbuch  fur  1834, 
pp.  249-312  (1832).  The  tables  on  these  pages  are  repro- 
duced in  Encke’s  Astronomische  Abhandlungen  Vol.  1,  No. 
7 (Berlin,  1866).  Kramp’s  tables  (see  preceding  footnote) 
formed  the  basis  for  Encke’s  computations.  Abbreviations 
of  Encke's  tables  are  given  in  several  more  recent  books, 
among  which  are  T.  W.  "Wnght’s  Adjustment  of  Observations 
(Van  Nostrand,  1884;  revised  by  J.  F.  Hayford  in  1906), 
David  Brunt’s  Combination  of  Observations  (Cambridge 
University  Press,  1917),  W.  W.  Johnson’s  Theory  of  Errors 
and  Method  of  Least  Squares  (John  Wiley,  1912),  A.  de 
Forest  Palmer’s  Theory  of  Measurements  (McGraw-Hill, 
1912),  The  Smithsonian  Physical  Tables,  page  57. 


and  the  interpretation  is  that  the  hypothesis,  as 
far  as  a is  concerned,  is  unlikely.  If  P^  turns  out 
to  be  nearly  unity,  it  is  practically  certain  that  if 
the  suggested  <r  were  the  true  value,  the  S.D. 
of  the  sample  would  have  been  larger  than  that 
observed.  Hence  the  suggested  value  of  a would 
again  appear' unlikely.  'When  P,  is  anywhere  near 
I,  there  is  no  ground  for  rejecting  the  hypothesis 
on  the  basis  of  this  criterion.  This  test  will  be 
called  the  “s  test.” 

Instead  of  using  tables  of  the  incomplete 
gamma  function  for  calculating  Ps,  it  is  usually 
easier  in  this  work  to  use  tables  for  the  chi-test.® 
In  the  chi-tables,  P(x^)  depends  on  two  argu- 
ments, and  the  number  of  “degrees  of  freedom.” 
Ps  will  be  identical  with  P(x^)  if  ns^la-  replaces  x‘ 
and  if  ra  — 1 be  taken  for  the  number  of  degrees  of 
freedom.  In  Elderton’s  table  the  number  of 


SCALE  FOR  a/(<r/Vn)  and  u/r 
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degrees  of  freedom  is  denoted  by  — 1 , and  in 
Fisher’s  table  it  is  denoted  by  n.  The  identity  of 
Ps  and  P(x^)  is  more  than  a mathematical 
coincidence,  for  it  turns  out  in  studying  the  chi- 
test  that  actually  is  for  n observations 

made  on  a single  magnitude — but  we  cannot 
pursue  the  matter  further  here.  Besides  the  chi- 
tables,  another  short  cut  to  calculating  P^  is 
possible  when  n is  around  20  or  more,  for  then  the 
normal  curve  of  S.D.  tr/V  (2«)  is  a close  enough 
approximation  to  Helmert’s  equation  near  the 
mode,  as  was  learned  from  Fig.  4,  and  a table  of 
the  normal  probability  integral  can  be  used  to 
ascertain  whether  the  sample  is  unusual.  For 
smaller  values  of  n this  approximation  may  not 
be  close  enough. 

A third  criterion  comes  from  the  z contours. 
CO  and  OD  are  drawn  making  the  angles  ± arc 
tan  u/s  with  the  i axis.  The  fraction  Pj  of  the 
volume  lyin'g  outside  these  contours  is  the 
probability  of  drawing  a sample  of  n having  a 
ratio  of  M to  5 greater  than  the  ratio  arising  from 
the  proposed  u and  observed  5.  The  calculation  of 
Pz  leads  to  a third  type  of  probability  integral, 
the  incomplete  beta  function;  however,  the 
special  type  here  encountered  is  generally  known 
as  “Student’s  integral,”  since  it  is  simply  an 
integral  of  Student’s  distribution  and  Student 
himself  prepared  rather  extensive  tables. From 
Eq.  (21) 


2 

P.=  l 


(31) 


P.Z  is  the  fractional  part  of  the  area  lying  beyond 
±2  under  Student’s  distribution  of  z (Fig.  5), 
just  as  Pu  is  the  fractional  part  of  the  area  lying 
beyond  ±w  under  the  normal  distribution  of  u, 
and  shown  shaded  in  the  upper  right-hand  corner 
of  Fig.  8.  For  the  calculation  of  Student’s  integral 
it  is  not  necessary  to  postulate  a value  of  a,  since 
Pz  is  simply  the  probability  that  a sample  of  n 
will  fall  outside  a proposed  pair  of  z contours,  and 
these  are  independent  of  <r.  Probably  the  handiest 
scheme  for  looking  up  the  value  of  Student’s 
integral  is  with  the  nomograph  devised  by  V.  A. 
Nekrassoff^^  and  reproduced  as  our  Fig.  9 with 
the  kind  permission  of  the  Bell  Telephone 
Laboratories.  The  curved  portion  of  the  2V  {n  — 1) 
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scale  will  give  better  results  than  the  straight 
portion,  which  it  supersedes  over  a short  range, 
but  both  the  curved  and  straight  portions  will 
give  practically  the  same  results. 

The  reader  familiar  with  Fisher’s  methods  will 
realize  that  the  2 test  here  described  is  equivalent 
to  his  t test  for  the  significance  of  the  mean  of  a 
single  sample. 

A very  small  value  of  signifies  that  the 
sample  has  an  exceptionally  large  value  of  2; 
thus,  on  the  average,  only  once  in  1000  trials  will 
u/s  (=2)  be  so  large  that  Pz  = 0.001.  In  such 
cases  either  the  proposed  error  u is  unusually 
large  or  else  the  S.D.  5 of  the  sample  is  accidently 
very  low.  Evidently,  then,  if  we  reject  the  idea 
that  the  error  in  li;  is  as  great  as  the  proposed 
value  of  u every  time  P»  turns  out  to  be  small,  we 
shall  occasionally  reject  a perfectly  good  hy- 
pothesis, for  not  only  will  the  error  in  the  sample 
actually  be  large  sometimes  but  also  the  S.D.  5 
will  occasionally  be  unusually  small.  When, 
however,  Pz  is  closer  to  unity,  say  0.2  or  greater, 
the  sample  is  not  unusual,  and  the  interpretation 
is  either  that  u is  not  exceptionally  large  or  that 
if  it  is,  then  5 is  also.  In  such  a case  it  would 
evidently  be  unwise  to  conclude  that  the  error  in 
X can  easily  be  as  great  as  the  postulated  value 
of  u unless  there  is  good  reason  to  believe  that 
the  S.D.  of  the  sample  is  not  unusual. 

If  the  S.D.  of  the  sample  happens  to  be 
exceptional,  the  u and  z tests  will  give  different 
results  regarding  the  proposed  value  of  u,  and  it 
is  the  latter  test  that  will  be  misleading.  Without 
even  a guess  as  to  where  a lies  there  is  no  way  of 
surmi  ing  whether  5 is  or  is  not  extraordinary  and 
the  2 test  will  accordingly  be  hazardous  when 
considering  the  error  of  the  sample.  On  the  other 
hand,  if  there  is  some  fairly  definite  knowledge 
concerning  a,  the  u test  can  be  applied ; the  2 test 
is  in  this  case  irrelevant  except  that  it  serves  as  an 
indication  of  whether  the  S.D.  of  the  sample  is  or 
is  not  extraordinary.  If  the  sample  is  not 
exceptional,  the  u and  2 tests  will  indicate 
substantially  the  same  conclusions;  and  con- 
versely, if  the  sample  is  exceptional  they  will 
disagree. 

This'  has  an  important  bearing  in  those 
problems  in  physics  wherein,  having  given  the 
mean  x and  the  S.D.  s of  n observations,  we  seek 
merely  the  probability  that  the  error  in  x could 
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NOMOGRAPHIC  REPRESENTATION 

OF  THE  PROBABILITY  P^  THAT  THE 
ERROR  IN  THE  MEAN  OF  A SAMPLE 
OF  n,  MEASURED  IN  TERMS  OF  ITS  S.  D. , 
IS  GREATER  IN  MAGNITUDE  THAN  Z. 
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FROM  A NOMOGRAPH  PUBLISHED 
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Fir,.  9.  Chart  for  making  the  3-test.  Published  by  permission  of  the  Bell  Telephone  Laboratories,  Inc. 
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be  as  great  or  greater  than  a proposed  error  u ; in 
other  words,  where  we  seek  the  probability  that 
the  given  sample  could  have  been  drawn  from 
some  normal  parent  population  having  its  mean 
at  !i{  = x — u)  and  having  any  S.D.  whatever.  This 
is  a natural  question  to  ask,  especially  when 
there  is  no  information  at  hand  concerning  a. 
Now  since  <r  is  not  needed  for  the  z test,  it  might 
seem  that  here  is  a criterion  peculiarly  adapted 
to  the  problem.  Unfortunately,  however,  this  is 
not  so.  For  although  its  value  may  be  unknown, 
nevertheless  the  parent  population  does  have  a 
certain  S.D.,  and  as  has  just  been  learned,  it  is 
necessary  to  have  some  notion  what  this  S.D.  is 
before  the  proposed  error  u can  be  judged  with 
confidence  on  the  basis  of  the  z test.  Evidently, 
then,  it  is  impossible  to  make  any  progress 
without  postulating  some  value  of  a,  and  all 
conclusions  respecting  the  error,  whether  drawn 
from  the  u or  the  z test,  will  depend  on  this 
postulate. 

The  z test  simply  tells  whether  the  value  of 
u/s  obtained  in  a given  sample  is  extraordinarily 
large  or  small,  and  for  this,  it  is,  of  course, 
perfectly  valid.  Usually,  however,  we  are  more 
interested  in  knowing  whether  u = x~n  is  ex- 
ceptionally large  or  small,  and  the  trouble  with 
testing  2 = u/s  is  that  z expresses  u in  units  of  s, 
which  is  itself  a variable,  being  subject  to  the 
fluctuations  of  sampling  according  to  Eq.  (14). 

Careful  considerations  of  the  u,  s and  z tests 
will  generally  disclose  about  all  the  information 
concerning  the  parent  population  that  the  sample 
alone  is  capable  of  giving.  Any  one  of  the  three 
tests  by  itself  may  be  misleading,  because  they 
all  possess  an  inherent  weakness  owing  to  the 
fact  that  the  contours  on  which  they  depend 
extend  to  infinity. 

An  important  contribution  was  made  by  J. 
Neyman  and  Egon  S.  Pearson^^  when  they 
developed  a single  test  depending  on  a unique 
family  of  closed  contours  for  the  probability 
associated  with  a proposed  parent  population. 
They  devised  for  this  purpose  the  X contours,  and 
the  test  depending  on  them  will  be  called  the 
“X  test.”  Along  a X curve  the  ratio  of  the  altitude 

J.  Neyman  and  Egon  S.  Pearson,  Biometrika  20a, 
175-241  (1928).  The  diagrams  and  tables  published  by 
Neyman  and  Pearson,  together  with  remarks  on  their  use, 
will  be  found  in  Tables  for  Statisticians  and  Biometricians, 
Part  II. 


at  any  point  of  the  u,s  frequency  surface  to  the 
maximum  value  that  it  can  be  made  to  take  (by 
putting  M = 0 and  a = s)  remains  constant.  The 
fraction  of  the  volume  under  the  u,s  frequency 
surface  lying  outside  the  X contour  drawn  through 
the  point  iu,s)  is 

Px  = 

V(27r)r[i(«-l)]2^Hn-3)^2 

(3,, 

the  integral  being  taken  outside  the  X curve. 
Neyman  and  Pearson  published  values  of  P\  as  a 
function  of  n,  u/a,  and  s/<j.  By- means  of  their 
diagram  and  table  the  X test  is  as  easy  to  apply  as 
any  of  the  others.  When  turns  out  to  be  small, 
the  hypothesis  respecting  /r  or  <r,  or  both,  appears 
questionable.  The  diagram  published  by  Neyman 
and  Pearson  enables  the  computer  to  ascertain  at 
a glance  just  where  the  trouble  lies  when  F\ 
turns  out  to  be  small. 

A fifth  test  is  provided  by  the  h contours  of  Eq. 
(25),  but*  the  difference  between  Pa  and  Px  is 
insignificant,  and  there  is  a theoretical  reason 
why  the  X contours  are  better  suited  to  the 
purpose.  The  5 contours  are  curves  of  equal 
altitude  on  the  u,s  frequency  surface,  but  for 
small  values  of  n they  would  not  be  curves  of 
equal  altitude  on  a u,s‘‘  or  on  a u,s^  frequency 
surface.  But  the  significance  of  the  X contours  is 
always  the  same,  regardless  of  the  coordinate 
system.  As  n increases,  the  5 and  X contours 
approach  coincidence;  in  fact  at  w=10  they  are 
already  very  close  together. 

The  significance  of  each  test  depends  not  only 
on  the  value  of  P (P„,  Ps,  • • •)  that  is  found,  but 
also  on  how  rnuch  is  known  a priori  regarding  the 
parent  population.  A hypothesis  regarding  n and 
ff  cannot  be  accepted  merely  because  the  tests 
give  high  values  of  P,  for  it  may  seem  wise  to 
abandon  this  hypothesis  in  favor  of  one  that 
leads  to  smaller  values  of  P but  which  is  a priori 
more  logical  or  has_a  more  rational  basis.  For  this 
reason  considerable  caution  must  be  exercised 
before  accepting  a hypothesis  purely  on  the  basis 
of  any  one  or  all  of  these  tests.  High  values  of  P 
simply  show  that  there  are  no  grounds  for 
rejecting  the  proposed  values  of  n and  a on  the 
basis  of  these  criteria  alone. 
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On  the  other  hand,  a very  low  value  of  P does 
not  present  such  difficulties,  for  it  forces  us, 
regardless  of  a priori  considerations,  either  to 
admit  that  the  sample  is  exceptional  or  to  regard 
the  hypothesis  with  suspicion.  Which  one  of 
these  alternatives  is  to  be  chosen  will  depend  for 
one  thing  on  how  compelling  were  the  reasons  for 
selecting  the  particular  hypothesis  in  the  first 
place.  It  is  thus  clear  that  statistical  tests  are 
more  readily  useful  for  rejecting  a hypothesis 
than  for  accepting  one. 

In  rejecting  a hypothesis  we  may  reject  one 
that  is  true:  in  accepting  one  we  may  accept  one 
that  is  false.  The  frequency  of  the  former  mistake 
can  be  controlled  to  a large  extent  by  lowering 
the  limit  for  rejection.  Thus,  if  we  decide  to  reject 
a hypothesis  when  any  P<0.01,  we  shall  commit 
the  mistake  of  rejecting  a perfectly  good  one  on 
the  average  of  once  in  100  such  tests,  but  by 
lowering  the  rejection  limit  to  0.001  we  lower  this 
average  to  once  in  1000.  It  is,  however,  impos- 
sible to  control  so  easil>  .he  mistake  of  accepting 
a hypothesis  on  the  basis  of  a high  value  of  P 
when  actually  it  is  false,  for  there  will  always  be 
false  hypotheses  that  give  higher  values  of  P than 
the  true  one  gives,  so  that  it  is  impossible  to 
distinguish  between  the  true  and  the  false  by 
objective  tests  alone.  Methods  for  making 
quantitative  use  of  other  information  concerning 
the  hypotheses  under  test  have  been  devised  by 
J.  Neyman  and  Egon  S.  Pearson,^^'  who  have 
given  an  excellent  discussion  of  this  whole 
subject. 

In  some  cases  it  is  more  important  to  avoid 
rejecting  a hypothesis  that  is  true  than  it  is  to 
avoid  accepting  one  though  it  be  false;  and  in 
other  cases  just  the  reverse  is  true.  The  serious- 
ness of  either  mistake  depends  on  the  action  that 
is  to  follow  the  decision  and  on  the  interests 
involved.  A clear  illustration  of  this  statement  is 
found  in  the  conflicting  interests  of  producer  and 
consumer  in  the  results  of  sampling  tests  on  a 
consignment  of  goods.  For  the  proposed  hypothe- 
sis we  might  say  that  the  consignment  which  is 
sampled  complies  with  certain  specifications; 
then  a low  rejection  limit  works  to  the  advantage 
of  the  producer  but  to  the  disadvantage  of  the 

Neyman  and  Egon  S.  Pearson,  Phil.  Trans.  Roy. 
Soc.  A231,  289-337  (1933);  Proc.  Camb.  Phil.  Soc.  29,  492- 
510  (1933).  See  also  Thornton  C.  Fry,  Probability  and  Its 
Engineering  Uses,  pp.  269-270  (Van  Nostrand,  1928). 


consumer,  whereas  a high  rejection  limit  does 
just  the  opposite. 

As  an  example  for  illustrating  the  application 
of  the  different  tests  let  us  consider  the  following 
10  readings  made  on  a micrometer:  1.078,  1.080, 
1.071,  1.076,  1.081,  1.077,  1.075,  1.073,  1.079, 
1.070.  There  is  reason  to  suppose  that  these  are  of 
equal  reliability,  so  they  will  be  given  equal 
weight.  Their  mean  is  X=  1.0760  and  their  S.D. 

5 = 0.00355.26 

Let  us  first  consider  the  hypothesis  that  the 
sample  was  drawn  from  a parent  population  with 
true  mean  1.0740.  If  this  is  the  case,  the  true 
error  of  the  mean  of  our  sample  is  -t- 0.0020,  and 
we  may  now  ask  the  question,  what  is  the  chance 
that  the  true  error  could  be  as  large  as  or  larger 
than  0.0020?  Without  some  knowledge  con- 
cerning a the  only  thing  we  can  do  is  to  postulate 
that  the  sample  was  not  extraordinary,  and  apply 
the  z test.  If  m= +0.0020  or  greater,  then 
u/s=  +0.0020/0.00355  = +0.563  or  greater.  Now 
with  w=10  and  z = 0.563,  Fig.  9 shows  that 
Pj  = 0.13.  So  in  about  1 out  of  8 samples  of  10, 

I m/5  I will  be  as  large  as  or  larger  than  0.563,  or  in 
1 out  of  16  samples,  u/s  will  be  as  large  as  or 
larger  than  +0.563.  Hence  on  the  assumption 
that  the  S.D.  of  the  10  readings  is  not  unusual, 
there  is  no  compelling  reason  to  reject  the 
proposal  that  if  the  number  of  measurements 
were  to  be  indefinitely  increased,  their  mean 
would  finally  settle  down  to  the  value  1.0740. 

Suppose  now  that  there  has  been  some  previous 
work  done  by  the  same  observer  with  the  same 
instrument,  and  there  is  good  reason  to  believe 
that  <r  lies  very  close  to  0.0040.  It  is  clear, 
without  actually  calculating  Pj,  that  0.00355 
was  in  fact  not  an  extraordinary  S.D.,  for  the 
average  S.D.  in  samples  of  10  drawn  from  a 
normal  parent  population  having  <r  = 0.0040  is,  by 

One  of  the  slowest  ways  to  eonipute  x and  5 is  to  follow 
their  definitions,  i.e.,  lake  the  sum  and  divide  by  n, 
and  then  find  the  s(|uare  root  of  the  average  squared 
residual.  Considerable  lime  ean  be  saved  by  computing 
X and  j simullaneously  by  using  the  departures  from  some 
selected  poini  (inslead  of  from  x),  and  then  applying  a 
correction.  In  this  example  1.075  might  be  selected  as  a 
datum.  The  departures  from  this  point  are  3,  5,  —4,  1,  6, 
2,  0,  —2,  4,  —5,  all  limes  10“\  The  average  of  these 
numbers  is  -fl.O,  whence  x=  1.0754-0.0010=  1.0760.  The 
sum  of  their  scpiares  is  136;  hence,  by  a well-known 
formula  in  mechanics,  5^*=  (136/10  — 1.0'“)  • 10~®=  12.6- 10“® 
and  5 = 0.00355.  .See  Whittaker  and  Robinson,  The  Calculus 
of  Observations,  Art.  96  (Blackie  and  Sons,  1924  and 
1926). 
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Table  I,  0.0040-0.9227  = 0.0037,  which  is  close  to 
0.00355  So  in  this  case  the  conclusion  indicated  by 
the  z test  can  be  accepted  with  confidence.  But  if 
a is  known  pretty  definitely,  the  u test  is 
possible.  The  probable  error  of  the  ten  measure- 
ments is  r = 0.674- • -<r/V  10  = 0.00085,  so  the 
ratio  M : r = 0.0020  : 0.00085  = 2.34.  The  ratio 
u Iff /yin  is  0.0020  : 0.0040/V  10  = V (5/2)  = 1.58. 
Either  of  these  ratios  enables  P„  to  be  read 
quickly  from  Fig.  8.  The  result  is  P„  = 0.114, 
which  means  that  there  is  about  1 chance  in  9 
that  I u I S 0.0020,  or  that  there  is  about  1 chance 
in  18  that  M = -f-0.0020.  The  u and  z tests 
therefore  concur,  as  they  will  when  the  S.D.  of 
the  sample  is  not  extraordinary. 

In  the  preceding  paragraphs  we  have  made  the 
hypothesis  that  the  mean  of  the  parent  popula- 
tion is  1.0740  and  on  the  basis  of  the  z and  u 
tests  have  calculated  the  chance  that  a sample  of 
10  with  x=  1.0760  or  greater  (i.e.,  with  u = 
-1-0.0020)  could  have  been  drawn  from  such  a 
parent  population.  In  making  the  z test  it  was 
necessary  to  assume  that  the  S.D.  of  the  sample 
was  not  extraordinary,  and  in  the  u test  to 
assume  and  use  some  definite  value  of  <r.  The 
reliance  that  can  be  placed  on  the  results  depends 
entirely  on  the  validity  of  these  assumptions. 
When  ff  is  not  known  very  definitely,  but  reasons 
exist  for  thinking  that  it  may  be  in  the  neighbor- 
hood of  (e.g.)  0.0040,  we  might  be  interested  in 
the  question  of  what  fraction  of  the  samples 
drawn  from  a normal  parent  population  with 
n—  1.0740  and  (r  = 0.0040  would,  on  the  average, 
lie  outside  the  oval  shaped  X contour  drawn 
through  the  point  in  the  u,s  plane  corresponding 
to  the  10  observations.  With  m = 0.0020  and 
5 = 0.00355  it  is  found  that  P\  = 0.27,  which 
means  that  about  3 out  of  11  samples  will  fall 
outside  this  X contour.  On  the  basis  of  the  X test, 
then,  there  is  no  reason  to  reject  the  {Proposal 
that  1.0740  and  <r  = 0.0040. 

For  the  sake  of  illustration,  it  is  interesting  to 
assume  that  (t  = 0.0025  instead  of  0.0040.  This 
will  reverse  some  of  the  previous  conclusions.  In 
the  first  place,  the  S.D.  of  the  sample  now 
appears  to  be  exceptionally  high,  for  with 
z»  = w5V2cr2=  10(0.00355)72(0.0025)2  = 10.1,  £q. 
(30)  gives 

P,  = 1 - r„(9/2)/r(9/2)  = 0.0168, 


which  means  that  in  only  about  17  samples  out  of 
1000  could  the  S.D.  be  as  high  as  or  higher  than 
that  found.  We  may  now  expect  the  z and  u tests 
to  disagree.  The  probable  error  of  x is  now  only 
(0.674-  - -)(0.0025)/V  10  = 0.000533,  and  the  pro- 
posed error  0.0020  is  accordingly  3.75  times  the 
probable  error,  for  which  P„  is  0.0114 — just 
about  1/10  of  what  it  was  before.  So  if  o-  = 0.0025, 
an  error  as  large  in  magnitude  as  0.0020  could 
occur  in  only  11  or  12  samples  out  of  1000,  and 
the  proposal  that  1.0740  could  be  the  true  value 
should  be  looked  upon  with  suspicion.  Certainly 
in  the  face  of  such  odds  the  proposal  could 
hardly  be  other  than  rejected  without  some  very 
forceful  arguments  to  support  it.  This  conclusion 
contrasts  with  that  which  would  be  drawn  from 
the  z test,  for  Pz  retains  its  former  value,  0.13. 
The  disagreement  between  the  u and  z tests 
shows  how  misleading  the  latter  would  be  if  used 
alone.  The  trouble  comes,  of  course,  from  the  fact 
that  the  S.D.  of  the  sample  is  now  exceptionally 
high. 

Finally,  we  may  examine  the  X contour  on  the 
double  assumption  that /x=  1.0740  and  (7  = 0.0025. 
In  this  case  P\  is  found  to  be  0.013,  which  is  so 
low  that  the  assumption  appears  improbable. 
From  the  position  of  the  sample  in  the  u,s 
diagram  it  is  evident  that  the  low  value  of  P\ 
arises  almost  solely  from  the  high  value  of  the 
ratio  s/ff. 

(3d).  Three  important  relations  when  P=f 

The  u,  s,  and  z tests  lead  to  three  important 
statistical  relations.  If  the  straight  line  contours 
of  Fig.  7 take  positions  such  that  the  volume 
under  the  u,s  frequency  surface  is  divided  sym- 
metrically into  quarters  by  each  of  them,  it  will 
be  an  even  bet  that  a random  sample  will  fall 
inside  or  outside  the  u and  z contours,  and  above 
or  below  the  s contour.  Fig.  10  illustrates  this 
situation. 

In  Fig.  10a,  Pu  = i-  The  lines  A A and  BB 
effect  quartile  divisions  of  every  one  of  the 
normal  curves  obtained  by  taking  sections 
5 = const,  through  the  u,s  frequency  surface. 
The  particular  constant  value  of  | m | along  these 
lines  is  therefore  r = 0.674- • - (t/V w,  the  prob- 
able error  of  the  mean  of  n observations. 

In  Fig.  10b,  Pa  = h-  The  line  EE  divides  into 
halves  the  area  under  each  of  the  Helmert’s 
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Fig.  10.  The  volume  under  the  u,s  frequency  surface 


can  be  divided  into  quarters  in  several  different  ways' 
Here  the  division  is  effected  with  u,  s,  and  z contours  by 
setting  the  shaded  areas  (to  infinity)  each  equal  to  |. 
In  (a)  the  lines  A A and  BB  are  a distance  r from  the  s 
axis,  r being  the  “probable  error.”  r is  determined  from 
the  normal  probability  integral  by  setting  — In  (b) 
the  line  EE  divides  all  the  « = const.  curves  into  halves 
and  therefore  lies  the  median  distance  i = tr// above  the  u 
axis.  1//  is  determined  from  the  Tables  of  the  Incomplete 
Gamma  Function  by  setting  Ps  = h and  its  values  are 
given  in  Table  I,  column  2.  In  (c)  the  lines  CO  and  DO 
make  equal  angles  with  the  ^ axis,  this  angle  being  tan~'  f. 
f is  determined  from  Student’s  integral  by  setting  Pz=\, 
and  its  values  are  given  in  Table  II,  column  2.  r and  s 
depend  on  a and  n both,  whereas  f depends  only  on  n. 
z — ujs  is  constant  and  equal  to  f along  the  lines  CO 
and  DO. 


curves  that  are  obtained  by  taking  sections 
= const,  through  the  u,s  frequency  surface.  The 
particular  constant  value  of  5 along  RR  is  s=  <r//, 
the  median  of  Helmert’s  distribution,  given  by 
Eq.  (18)  and  Table  I,  column  2. 

In  Fig.  10c,  Pz  = \-  The  constant  value  of 
z = ujs  along  the  z contours  is  always  the  tangent 
of  the  angle  that  these  contours  make  with  the  5 
axis.  In  Fig.  10c,  the  lines  OD  and  OC  effect  a 
quartile  division  of  Student’s  distribution  of  z, 
and  the  particular  constant  value  of  |z|  along 
them  is  denoted  by  f.  Values  of  f for  n running 
from  3 to  25  have  been  calculated  for  us  by  Lola 
S.  Deming  and  are  listed  in  Table  II. 


The  values  of  f in  Table  II  were  calculated  by  putting 
2 = tan  0 and  then  making  successive  approximations  to 
find  the  limits  of  the  integral  written  in  the  heading  of 
Table  II.  The  same  purpose  could  be  accomplished  with 
less  precision  by  inverse  interpolation  in  Student’s  original 
tables  (see  footnote  17)  or  in  later  tables  by  Student  and 
R.  A.  Fisher,  Metron  5,  No.  3,  pp.  90-120  (1925).  Another 
possibility  is  inverse  interpolation  in  the  Tables  of  the 
Incomplete  Beta  Function,  recently  prepared  by  Karl 
Pearson  and  his  staff  (issued  by  the  Biometric  Laboratory, 
University  College,  London,  W.  C.  1,  1934),  but  our 
Table  II  was  calculated  and  used  several  years  before  the 
appearance  of  the  Tables  of  the  Incomplete  Beta  Function. 


Table  II.  The  quartile  deviation  f in  Student's  distribution. 
f is  defined  by 

Comparison  with  the  normal  curve  of  S.D.  iRn  — i/l)^. 


0.674-  - - 

Discrepancy, 
percent  low 

n 

r 

V(w-3/2) 

3 

0.577  349 

0.550  719 

4.612 

4 

.441  614 

.426  585 

3.403 

5 

.370  348 

.360  530 

2.651 

6 

.324  981 

.317  957 

2.161 

7 

.292  942 

.287  603 

1.822 

8 

.268  786 

.264  557 

1.573 

9 

.249  745 

.246  289 

1.384 

10 

.234  241 

.231  348 

1.235 

11 

.221  300 

.218  833 

1.115 

12 

.210  288 

.208  152 

1.016 

13 

.200  768 

.198  896 

0.9324 

14 

.192  434 

.190  774 

0.8621 

15 

.185  056 

.183  573 

0.8014 

16 

.178  467 

.177  130 

0.7492 

17 

.172  533 

.171  321 

0.7025 

18 

.167  154 

.166  048 

0.6617 

19 

.162  249 

.161  234 

0.6256 

20 

.157  752 

.156  816 

0.5930 

21 

.153  607 

.152  742 

0.5631 

22 

.149  774 

.148  970 

0.5368 

23 

.146  214 

.145  464 

0.5129 

24 

.142  896 

.142  195 

0.4906 

25 

.139  794 

.139  137 

0.4707 

As  has  already  been  pointed  out,  a does  not 
enter  Student’s  distribution  of  z,  hence  f is 
independent  of  <x  and  depends  only  on  n. 
Further,  since  the  normal  curve  of  S.D. 
l/(n  — 3/2)^  or  of  probable  error  0.674- ■•/ 
(w  — 3/2)^  was  found  to  be  an  excellent  approxi- 
mation to  Student’s  distribution  of  z near  the 
center,  we  should  expect  this  last  expression  to 
be  a good  approximation  to  f,  provided  n is  not 
too  small.  The  actual  discrepancy  is  given  in 
Table  II,  column  4.  In  practice,  the  approxima- 
tion f = 0.674- • -/(w  — 2)^  will  be  found  entirely 
satisfactory  when  w>20,  though  of  course, 
0.674-  - -/(«  — 3/2)^  is  always  a better  one. 

If  s be  computed  for  each  of  an  indefinitely 
large  number  N of  samples,  half  the  values  of  s 
will  be  less  than  s=c/f,  and  the  other  half  will 
be  greater,  by  definition  of  the  median  s = a/J  in 
Eq.  (18).  Clearly,  then,  if  fs  be  computed  for 
each  sample,  half  the  values  of  fs  will  be  less 
than  cr  and  half  will  be  greater.  Finally,  if 
0.674- - -/i/V  M be  computed  for  each  sample, 
half  will  be  less  than  r and  half  will  be  greater. 
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It  is  convenient  to  denote  0.614:- ■ -fj ^ n by 
the  symbol  so  that  (p  and  (ps  bear  the  same 
relation  to  the  probable  error  r that  / and  fs  do 
to  the  S.D.  a.  Values  of  cp  are  given  in  the  second 
column  of  Table  III.  The  heading  of  this  column 
is  (pio,  for  reasons  that  will  become  clear  later. 

The  preceding  discussion  shows  that  the 
contours  in  Figs.  10a  and  10c  correspond  to 
quartile  distances  r and  f on  the  distributions  of 
u and  z,  respectively,  and  that  the  contour  in 
Fig.  10b  corresponds  to  the  medians  a/f,  a,  and 
r on  the  distributions  of  s,  fs,  and  <ps,  respec- 
tively. Hence  in  the  case  of  a large  number  of 
samples  of  n observations  each,  it  will  be  found 


that 

(a)  in  (|±6i)  the  cases. 

\u\ 

(b)  in  (|db€2)  the  cases. 

<ps  5 r\ 

fs^a\' 

(c)  in  (§±6.3)  the  cases, 

\u/s  \ 

wherein  ei,  a,  €3  approach 

zero  as  a statistical 

limit*  as  the  number  of  samples  is  indefinitely 
increased;  that  is,  the  odds  that  ei,  62  or  63  shall 
differ  from  zero  by  less  than  a stated  amount 
can  be  made  as  great  as  desired  by  taking  enough 
samples.  No  one  can  say  in  advance  just  how 


Fig.  11.  For  each  of  100  samples  of  4,  ±f5  is  laid  off  in 
the  vertical  from  the  point  that  represents  x.  The  mean  of 
the  parent  population  is  ^ = 0,  and  its  S.D.  is  unity. 
r = 0.074-  ■ 4 = 0.337.  The  horizontal  lines  at  distances 

±r  from  the  true  value  show  the  range  covered  by  the 
probable  error.  In  51  out  of  100  samples  \u  \ <fs.  In  52 
out  of  100  samples  \u\ <r.  In  53  out  of  iOO  samples 
4>s<r.  As  the  number  of  samples  is  indefinitely  increased, 
the  fractions  of  them  satisfying  these  three  inequalities 
each  approach  j as  a statistical  limit. 


many  samples  must  be  taken  in  order  that  ei 
may  be  less  than  (e.g.)  0.01,  but  it  is  possible  to 
find  the  probability  that  ei<0.01  for  a given 
number  of  samples. 

The  relations  (a),  (b),  and  (c)  just  given  can 
be  stated  still  more  simply  as  follows.  It  is  an 
even  bet  that  for  a random  sample 


(a) 

\u  \ >r 

or 

\u\  <r; 

b 

A 

fs<a^ 

(b) 

or 

<ps>r 

<ps<  rl 

\u/s  \ >i 

\u/s\<t 

(c) 

or 

1 M 1 > f 5 

\u  \ <^s 

The  character  of  each  of  these  quantities,  for 
any  given  value  of  n,  is  worthy  of  notice.  In  (a) 
r is  a constant  while  u varies  from  sample  to 
sample.  In  (b)  a and  r are  constants,  while /5  and 
(ps  vary  from  sample  to  sample.  In  (c)  f is  a 
constant  while  u/s  varies,  and  in  the  second 
form,  both  ^s  and  u vary  from  sample  to  sample. 
These  facts  and  relations  are  illustrated  in  Fig. 
11,  where  the  value  of  x for  each  of  a number  of 
samples  is  measured  along  the  vertical  and 
marked  by  a heavy  dot,  then  the  distance  ^s 
for  the  sample  is  laid  off  in  the  vertical  above 
and  below  the  dot.  Thus  a vertical  line  of  length 
2^s  with  center  at  x marks  each  sample.  In  Fig. 
11  these  lines  represent  the  first  100  samples  of 
4 drawn  from  a normal  parent  population  of 
S.D.  o-=l  and  mean  /r  = 0.**  From  Table  II, 
f = 0.4416  when  « = 4. 

It  will  be  noticed  that  in  51  out  of  the  100 
samples,  the  range  measured  from  x 

overlaps  the  true  value  n=0.  For  a random 
sample,  there  is  by  relation  (c)  above  an  even 
chance  that  [«|  <f5,  so  we  should  expect  to  find 
approximately  half  these  ranges  to  overlap  /jl=0. 

A pair  of  horizontal  lines  equally  spaced  at  a 
distance  r = 0.674  ^c/^l  4 = 0.337  • • • above  and 

below  the  true  value  yu=  0 show  the  range  covered 
by  the  probable  error.  Before  a sample  is  drawn, 
it  is  an  even  bet  by  relation  (a)  above  that 


These  are  listed  in  W.  A.  Shewhart’s  book,  The 
Economic  Control  of  Quality,  Table  D,  page  454  (Van 
Nostrand,  1931).  The  authors  are  indebted  to  Dr.  Shewhart 
for  the  idea  of  this  figure.  It  was  first  exhibited  by  him 
at  a joint  meeting  of  the  American  Mathematical  Society 
and  Section  K of  the  A.  A.  A.  S.  in  Atlantic  City,  De- 
cember 27,  1932. 
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\u  \ <r,  and  it  is  interesting  to  note  that  52  dots 
fall  inside  the  range  ±r  and  48  fall  outside.  If 
for  each  sample  in  the  figure  the  range  ±<f>s 
were  laid  off  from  the  horizontal  line  fi  = 0,  it 
would  be  found  that  <j>s<r  in  53  samples,  <j)S>r 
in  46  samples,  and  <t>s  = r to  three  digits  in  one 
sample.  (To  avoid  confusion,  the  ranges 
are  not  shown  on  the  figure.) 

Thus  the  100  values  of  |m|  and  the  100  values 
of  <j>s  are  separately  about  equally  divided  each 
side  of  the  probable  error  r.  At  the  same  time  the 
100  values  of  \u/s\  are  about  equally  divided 
each  side  of  f,  so  that  about  half  the  100  values 
of  I tt  I are  greater  than  the  corresponding  values 
of  ^s.  If  the  number  of  samples  were  indefinitely 
increased,  the  ratio  of  the  number  for  which 
\u  \ >r  to  the  number  for  which  \u\<r  would 
approach  unity  as  a statistical  limit,  and  the 
same  can  be  said  of  the  other  inequalities  written 
in  (b)  and  (c). 

(3e).  Fiducially  related  values  of  a and  s 

Closely  related  to  the  tests  that  have  previ- 
ously been  described  is  the  notion  of  fiducially 
related  values  of  a and  5.  The  adjective  fiducial 
was  introduced  in  1930  by  Fisher^*  for  the 
description  of  a certain  objective  relation  that 
exists  between  a parameter  of  the  parent  popu- 
lation and  the  corresponding  parameter  of  a 
sample  when  the  sampling  distribution  of  the 
latter  depends  only  on  the  former.  Such  is  the 
case  with  a and  s.  Thus,  if  a set  of  n observations 
has  been  taken  and  the  S.D.  is  found  to  be  s, 
we  can  arbitrarily  put  P,  = 0.95,  using  the 
observed  value  of  5 for  the  limit  of  integration  in 
Eq.  (30),  and  then  make  the  perfectly  objective 
statement  that  there  is  only  1 chance  in  20  that 
the  S.D.  of  the  parent  population  can  be  greater 
than  the  value  of  a required  to  be  used  in  the 
integral.  This  is  the  same  thing  as  drawing  the 
5 contour  of  Fig.  7 at  a distance  from  the  u axis 
equal  to  the  observed  S.D.  s,  and  then  arbitrarily 
selecting  for  <r  that  value  which  will  put  95 
percent  of  the  volume  of  the  u,s  frequency  surface 
above  the  contour  and  the  remaining  5 percent 
below  it.  These  particular  values  of  a and  ^ are 
accordingly  so  related  to  each  other  that  if  a 
were  actually  the  S.D.  of  the  parent  population 
then  there  would  be  19  chances  in  20  that  a 

R.  A.  Fisher,  Proc.  Camb.  Phil.  Soc.  26,  528-535  (1930). 


sample  drawn  therefrom  would  have  a S.D.  as 
large  as  or  larger  than  s\  and  conversely,  since 
s has  actually  been  observed,  there  is  only  1 
chance  in  20  that  the  S.D.  of  the  parent  popu- 
lation is  as  large  as  or  larger  than  <r. 

The  value  of  a required  to  be  used  in  the 
integrals  of  Eq.  (30)  will  for  a given  value  of  n 
be  a function  both  of  P,  and  of  the  limit  of 
integration  s,  so  it  seems  desirable  that  the 
nomenclature  for  fiducial  values  should  express 
this  fact.  If  P«  has  been  placed  equal  to  0.95,  we 
designate  the  required  value  of  a by  the  symbol 
(t(^,5)  and  call  it  “the  5 percent  fiducial  value  of 
(T  corresponding  to  the  given  value  of  because 
there  are  5 chances  in  100  that  the  S.D.  of  the 
parent  population  is  greater  than  a{sf>)  for  the 
given  value  of  s.  Likewise  the  value  of  s required 
to  be  used  as  a limit  of  integration  in  the  same 
equation  will  be  a function  of  a and  Ps,  so  when 
P«  = 0.95  we  denote  the  required  value  of  5 by 
the  symbol  5(tr,95)  and  call  it  “the  95  percent 
fiducial  value  of  5 corresponding  to  the  given 
value  of  O',”  because  there  are  95  chances  in  100 
that  the  S.D.  of  the  sample  is  greater  than 
5(<r,95)  for  the  given  value  of  «r. 

Now  it  so  happens  that  in  the  incomplete 
gamma  function  to  which  Eq.  (30)  reduces,  5 
and  a occur  only  in  the  ratio  s : a.  This  ratio 
will  of  course  be  a function  of  Ps  for  a given 
value  of  n.  If,  then,  for  P,  = 0.95  this  ratio  be 
denoted  by  I//9B,  Eq.  (30)  gives 

j ^n/2/296 

( e~^dx 

r[K«-i)]-'a 

= r„/2/9B[K« - i)]/r[§(«- 1)]  = 0.05,  (34) 

from  which  the  numerical  evaluation  of  /96  for 
different  values  of  n can  be  accomplished.  When 
«<9,  the  most  satisfactory  method  seems  to  be 
to  integrate  in  series,  retaining  enough  terms  to 
give  the  accuracy  desired,  and  then  to  solve  for 
«/2/^95  by  any  scheme  that  happens  to  be 
suitable  for  finding  the  numerical  roots  of  the 
resulting  algebraic  equation.  When  «>9,  inter- 
polation in  the  Tables  of  the  Incomplete  Gamma 
Function^^  by  means  of  a central  difference 
formula  will  give  7 place  accuracy.  Values  of  /9s 
obtained  by  a combination  of  these  methods  are 
shown  in  Table  III  for  n running  from  2 to  25. 


STATISTICAL  THEORY  OF  ERRORS 


143 


These  values  of  /gs  provide  the  reciprocal 
relation  between  the  S.D.  of  the  parent  popula- 
tion and  the  S.D.  of  the  sample  that  has  been 
described  above:  when  a sample  of  n shows  a 
S.D.  s,  there  is  only  1 chance  in  20  that  the  S.D. 
of  the  parent  population  whence  it  came  can  be 
greater  then/gg^,  and  conversely,  if  the  S.D.  of  a 
parent  population  is  <r,  there  are  19  chances  out 
of  20  that  the  S.D.  of  a sample  of  n drawn 
therefrom  will  be  greater  than 

The  notion  of  fiducial  values  can  ecisily  be 
extended  to  the  probable  error  of  the  mean  of  n 
observations,  for  if  there  is  only  1 chance  in  20 
that  the  S.D.  of  the  parent  population  is  greater 
than  /gs5,  there  is  the  same  chance  that  the 
probable  error  of  the  mean  o"f  observations  is 
greater  than 

r(5,S)  = 0.674  • • -/gsV  V » ^ <t>9bS,  (35) 

which  may  accordingly  be  termed  “the  5 percent 
fiducial  value  of  r corresponding  to  the  given 
S.D.  s.”  The  factor  0.674- • -/gg/V « is  denoted 
by  0gB,  as  just  indicated,  and  its  values  for  n 
between  2 and  25  are  listed  alongside  the  values 
of  /gg  in  Table  III.  The  factor  <^gg  gives  a very 
useful  relation,  because  although  the  value  of  a, 
and  hence  that  of  r,  may  be  unknown,  we  can 
be  “19/20  sure”  that  r is  not  greater  than  r(^,5) 
as  calculated  in  the  last  equation.  Thus,  to  go 
back  to  the  10  observations  previously  under 
consideration,  since  their  S.D.  is  0.00355  there  is 
only  1 chance  in  20  that  the  probable  error  of 
their  mean  x is  greater  than  0.3699X0.00355 
= 0.00131.  The  values  of  <f>gg  in  Table  III  make 
the  calculation  of  5 percent  fiducial  values  of  r 
a very  simple  matter. 

With  the  notation  here  introduced,  extension 
to  other  fiducial  points  can  be  conveniently 
accomplished.  Thus  with  some  value  of  Pa  other 
than  0.95,  the  subscripts  for  / and  4>  can  be 
changed  to  the  new  percentage;  likewise  c{s,5), 
s{(T, 95),  r{s,5)  can  be  rewritten  to  correspond 
with  the  new  value  of  P,.  In  particular,  the  50 
percent  point  is  of  special  interest,  for  it  cor- 
responds to  the  median  of  Helmert’s  distribution 
of  s,  as  is  evident  from  a comparison  of  Eqs. 
(19)  and  (30).  The  values  of  I//50  are  accordingly 
just  those  ratios  of  sja  that  were  labeled  1// 
in  Eqs.  (18),  (19),  and  Table  I,  and  the  cor- 
responding factors  <^gg  = 0.674  •• -/go/ Vw  are 


just  those  that  were  denoted  by  <f>  in  the  pre- 
ceding section.  The  median  of  Helmert’s  curves 
is  so  frequently  used  that  for  brevity  and  con- 
venience the  subscript  50  will  ordinarily  be 
omitted,  so  that  except  when  emphasis  is  desired, 
/go  and  0go  will  appear  simply  as  / and  cj>. 

Values  of  /go  for  n between  2 and  25  are  shown 
in  the  second  column  of  Table  III;  these  are, 
of  course,  simply  the  reciprocals  of  l//in  Table  I. 
Alongside  these  are  shown  the  factors  <^go 
= 0.674- • -//V  «•  (Later  on,  (f)S  will  have  still 
another  significance,  and  it  will  be  convenient 
to  have  <^go  retabulated  in  Table  IV  for  com- 
parison with  two  other  functions  yet  to  be 
introduced.) 

When  the  S.D.  of  a sample  of  n turns  out  to 
be  s,  there  is  an  even  chance  that  <r</5,  and 


Table  III.  Fiducial  values  of  <r  and  s.  Multiplying 
factors  for  getting  the  5 and  50  percent  fiducial  values  of  <r, 
and  the  5 and  50  percent  fiducial  values  of  the  probable 
error  r,  corresponding  to  a given  S.D.  i in  a sample  of  n. 
fn  is  defined  as  the  ratio  of  the  5 percent  fiducial  value  of  <r 
to  the  observed  value  of  j.  /as  is  obtained  by  setting 
F,  = 0.95,  whereupon  Eq.  (30)  gives 

1 C"/2/^96  , 

1 = 0 05  1341 

The  5 percent  fiducial  value  of  a-  is  /asi,  and  the  5 percent 
fiducial  value  of  r is  accordingly 

r(5,5)  =0.674-  - -/asW « = <#>9ss.  (35) 

The  odds  are  19  : 1 that  r is  not  greater  than  ipas.  /so 
and  <^60  (or  simply/ and  4>)  are  defined  in  a similar  manner 
by  setting  Pa  = 0.50.  l//so  is  then  just  the  median  value  of 
s/a,  and  has  already  been  given  in  Table  I.  The  odds  are 
even  that  r is  not  greater  than  r(s,50)  = <t>hoS,  which  is  the 
50  percent  fiducial  value  corresponding  to  the  given 
S.D.  s. 


n 

fbO 

050  = 0.674. . ./s«/V  w 

fib 

095  = 0.674.  . ./9s/V  S 

2 

2.096  716 

1 

22.552  803 

10.756  2497 

3 

1.471  008 

0.572  8587 

5.353  057 

2.084  5706 

4 

1.300  244 

.438  5007 

3.371  735 

1.137  1005 

5 

1.220  476 

.368  1455 

2.652  372 

0.800  0640 

6 

1.174  244 

.323  3389 

2.288  667 

0.630  2057 

7 

1.144  059 

.291  6586 

2.068  899 

.527  4310 

8 

1.122  797 

.267  T514 

1.921  235 

.458  1533 

9 

1.107  009 

.248  8888 

1.815  807 

.408  0230 

10 

1.094  821 

.233  5170 

1.734  191 

.369  8896 

11 

1.085  127 

.220  6783 

1.670  828 

.339  7901 

12 

1.077  232 

.209  7462 

1.619  586 

.315  3470 

13 

1.070  678 

.200  2916 

1.577  196 

.295  0457 

14 

1.065  150 

.192  0093 

1.541  478 

.277  8745 

15 

1.060  424 

.184  6755 

1.510  922 

.263  1308 

16 

1.056  338 

.178  1222 

1.484  443 

.250  3104 

17 

1.052  769 

.172  2201 

1.461  245 

.239  0418 

18 

1.049  626 

.166  8682 

1.440  730 

.229  0455 

19 

1.046  836 

.161  9858 

1.422  439 

.220  1062 

20 

1.044  343 

.157  5083 

1.406  Oil 

.212  0553 

21 

1.042  102 

.153  3825 

1.391  165 

.204  7596 

22 

1.040  077 

.149  5648 

1.377  670 

.198  1113 

23 

1.038  238 

.146  0186 

1.365  341 

.192  0227 

24 

1.036  560 

.142  7132 

1.354  027 

.186  4220 

25 

1.035  023 

.139  6225 

1.343  599 

.181  2487 
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that  r^4>s.  This  statement  is  only  a repetition 
of  relation  (b)  in  the  previous  section,  but  it  is 
now  seen  that  the  even  odds  that  were  obtained 
by  placing  the  line  EE  in  Fig.  10  at  the  median 
is  only  one  of  an  infinite  set  of  odds  that  can  be 
laid  on  pairs  of  values  of  a and  5 through  the 
fiducial  relation.  In  practice,  it  has  been  found 
that  the  odds  1 : 1 and  19  : 1,  given  by  /so  and 
/gs,  will  yield  sufficient  information.  Thus, 
although  we  may  know  nothing  beforehand 
concerning  <r,  we  can  by  a glance  at  Table  III 
say  that  the  probable  error  of  20  observations 
is  as  likely  as  not  to  be  greater  than  0s  = 0.1585, 
but  that  there  is  only  1 chance  in  20  that  it  is 
greater  than  09s5  = O.212  5.  Since  these  two  mul- 
tiples of  5 are  so  close  together,  we  can  be  fairly 
confident  that  the  probable  error  of  the  20 
observations  is  in  the  neighborhood  of  4>s.  On 
the  other  hand,  while  the  probable  error  of  3 
observations  is  as  likely  as  not  greater  than 
0.573  5,  it  has  1 chance  in  20  of  being  greater 
than  2.085  On  account  of  the  disparity  between 
these  last  two  multiples  of  s (they  differ  by 
alrriost  four  -fold,  we  should  be  extremely 
cautious  about  assigning  any  value  to  the 
probable  error  of  3 observations,  on  the  basis  of 
their  S.D.  alone. 

It  is  interesting  to  note  from  Table  III  that 
the  values  of  /go  and  /gg  are  widely  different  when 
n is  small,  but  that  they  both  approach  unity 
monotonically  and  are  not  so  greatly  different 
toward  the  end  of  the  table.  The  approaching 
coincidence  of  /go  and  /gg  is,  of  course,  brought 
about  by  the  tendency  of  the  Helmert  curves  to 
become  more  and  more  concentrated  about  the 
abscissa  5/0-  = 1 as  n increases,  as  is  illustrated 
by  the  curves  in  Fig.  4.  This  shows  that  as  n 
increases,  the  fluctuations  in  5 are  confined 
more  and  more  to  a narrow  band  about  <7. 

§4.  The  Estimation  of  the  Probable  Error 

(4a).  Introduction 

R.  A.  Fisher^  has  divided  the  problems  of 
statistics  into  three  classes:  (a)  the  specification 
of  the  form  of  the  frequency  curve  of  the  parent 
population,  and  of  the  necessary  parameters; 
(b)  the  distribution  of  various  properties  (means, 
errors,  standard  deviations,  etc.)  of  samples 
drawn  from  a given  parent  population;  (c)  the 
estimation  of  the  parameters  of  the  parent  popu- 


lation from  information  provided,  at  least  in  part, 
by  the  sample.  The  first  and  second  class  can  be 
handled  independently  of  the  third,  but  the  third 
is  intimately  related  to  the  others.  In  this  treat- 
ment of  the  theory  of  errors,  the  problem  of 
specification  was  disposed  of  by  making  the 
assumption  that  the  parent  population  of  ob- 
servations is  normal.  The  simultaneous  dis- 
tribution of  errors  and  standard  deviations  in 
samples  was  then  found,  and  certain  deductions 
were  drawn  from  it. 

These  deductions  are  most  conveniently  ex- 
pressed in  terms  of  the  u,  s,  z,  and  X tests,  and 
by  means  of  the  fiducial  relation  between  cr  and  5, 
which  have  been  described  in  the  preceding 
sections.  These  tests  lead  to  statements  such  as 
the  following,  ^‘If  the  S.D.  of  the  parent  popu- 
lation is  O',  then  there  is  not  more  than  one 
chance  in  100  that  the  error  in  x could  be  as 
large  as  the  proposed  value  of  u,”  or  “It  is  an 
even  bet  that  the  error  in  x is  not  more  than 
(■5.”  Such  statements  are  entirely  objective,  and 
involve  none  of  the  risks  of  estimation.  These 
tests  make  no  pretense  of  estimating  cr;  the  u 
test,  for  example,  though  it  depends  on  a,  simply 
finds  the  odds  against  the  occurrence  of  an  error 
as  large  as  or  larger  than  the  proposed  error,  and 
the  odds  so  found  will  of  course  vary  as  a varies. 

The  parent  population  of  observations  is,  by 
assumption,  normal,  and  is  therefore  cornpletely 
specified  by  the  three  parameters  v,  u and  a. 
When  a set  of  n observations  is  taken,  their 
mean  x differs  from  u by  an  unknown  error  u. 
Odds  against  the  occurrence  of  an  error  as  large 
as  or  larger  than  a given  magnitude  can  be  found 
by  the  u test,  but,  as  has  been  noted,  the  results 
of  this  test  depend  on  the  value  of  a chosen  for 
the  purpose.  Clearly,  then,  it  is  desirable  to  use 
a value  of  a that  is  as  close  as  possible  to  the 
actual  S.D.  of  the  parent  population.  It  is  the 
purpose  of  any  process  of  estimation  to  provide 
a value  of  a that  will  make  the  u test  valid,  or, 
what  is  the  same  thing,  to  provide  an  estimate 
of  the  probable  error  of  x. 

The  problem  of  estimation  has  necessarily  been 
deferred  to  the  last,  since  it  is  a process  of  at- 
tempting to  reckon  from  the  sample  back  to  the 
parent  population,  and  therefore  depends  on  the 
distribution  of  u and  5.  It  is  a problem  that 
involves  all  the  entanglements  of  induction. 
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There  are  three  methods  of  attempting  to  say 
something  about  the  parent  population— max- 
imum likelihood,  empirical  or  arbitrary  schemes, 
and  the  posterior  method.  The  first  two  disregard 
all  prior  knowledge  and  base  the  estimate  purely 
on  the  sample.  The  last  one  utilizes  the  methods 
of  Bayes  and  Laplace  to  combine  previous  ex- 
perience or  knowledge  with  the  information 
contained  in  the  sample.  As  the  size  of  the  sample 
increases,  the  results  provided  by  all  these 
methods  become  indistinguishable.  The  three 
methods  will  be  treated  here  in  the  order  named. 

(4b).  Maximum  likelihood 

It  is  evident  from  Helmert’s  Eq.  (14)  and  the 
curve  for  « = 6 in  Fig.  3 that  when  a sample  of  n 
is  drawn  from  a parent  population  having  a 
certain  S.D.  <r,  the  S.D.  5 of  the  sample  may  lie 
anywhere  between  0 and  , whether  cr  be  large 
or  small.*-  It  is  further  evident  that  a sample  of 
S.D.  s may  have  come  from  any  one  of  an  infinite 
number  of  parent  populations.  Out  of  this 
infinity  of  parent  populations  there  is  a particular 
one  that  is  most  favorable  to  the  given  sample; 
that  is,  there  is  a particular  one  for  which  the 
probability  of  drawing  a sample  of  S.D. 
is  greater  than  for  any  other  parent  population. 
To  arrive  at  this  particular  one,  Helmert” 
simply  found  the  value  of  a that  makes  y in  Eq. 
(14)  a maximum  for  the  given  value  of  5 by 
setting  dy  j da  = The  necessary  relation  between 
5 and  <7  is  easily  found  to  be 

<r  = 5[^»/(«  — 1)3*.  (36) 

' This  value  of  cr,  which  will  be  called  ffj,  may 
be  adopted  as  an  estimate  of  the  unknown  S.D. 
(T  of  the  parent  population.  When  it  is  substituted 
into  Eq.  (13)  and  used  with  the  definition  of  5 
in  Eq,  (5)',  it  gives 

r,  = 0.674-  • •5/(«  — 1)* 

= 0.674- ••  1)3*  (37) 

for  an  estimate  of  the  probable  error  r oi  n 
equally  reliable  observations.  The  subscript  s, 
attached  to  any  quantity  such  as  u or  r,  signifies 
that  the  quantity  is  an  estimate  derived  from  the 
sample  alone.  The  factor  0.674- ••/(»— 1)*  is 

Since  the  least  count  of  any  measuring  instrument 
must  be  finite,  the  S.D.  of  a sample  will  in  practice  have 
an  upper  limit. 


tabulated  in  the  second  column  of  Table  IV  for 
n between  2 and  25.^* 

Eq.  (37)  is  a familiar  formula.  In  textbooks 
it  is  usually  called  the  “formula  for  the  probable 
error,”  but  it  should  be  carefully  noted  that  this 
is  a misnomer;  rs  is  not  the  probable  error  r of  x, 
it  is  an  estimate  of  r,  and  only  one  of  many 
possible  estimates.  Failure  to  realize  this  is  very 
likely  responsible  for  the  disrepute  of  “probable 
error”  in  some  quarters.  Just  as  x is  an  estimate 
of  fly  and  is  subject  to  statistical  fluctuations  for 
which  r is  a convenient  measure,  so  is  an 
estimate  of  r,  and  is  similarly  subject  to  statis- 
tical fluctuations  the  measure  of  which  will  be 
described  presently.  When  n is  small  these  fluc- 
tuations are  serious.  As  n increases,  they  become 
less  and  less  bothersome,  for  we  have  seen  from 
the  curves  of  Fig.  4 that  as  n increases  5 becomes 
more  and  more  restricted  to  the  neighborhood  of 
<7,  so  that  the  estimate  r,  becomes  more  and 
more  restricted  to  the  true  probable  error  r. 

The  introduction  of  the  factor  [«/(»  — 1)3*  in 
Eq.  (37)  is  called  “Bessel’s  correction,”  since  it 
seems  to -have  been  first  used  by  Bessel.  The 
history  of  just  how  and  when  he  derived  it  is  at 
present  obscure.  The  process  that  Helmert  used 
in  deriving  Bessel’s  correction  has  been  named 
by  R.  A.  Fisher^’  the  “method  of  maximum 
likelihood,”  and  the  estimate  so  obtained  the 
“optimum  value”;  Eq.  (37)  then  gives  the 
"optimum  estimate  of  r.”  Another  interpretation 
of  the  relation  between  5 and  a in  Eq.  (36)  will 
be  given  in  the  derivation  of  Eq.  (42). 

(4c).  Empirical  estimates 

There  are  other  methods  of  attempting  to 
reckon  from  the  sample  alone  what  the  S.D.  of 
the  parent  population  actually  was.  One  might 
arbitrarily  assume  that  the  observed  S.D.  s of 
the  sample  is  the  average  of  all  those  that  would 
be  observed  if  a very  large  number  of  samples 
were  to  be  drawn.  Geometrically  this  is  equiva- 
lent to  placing  the  observed  value  5 at  the  mean 
s of  the  S.D.  frequency  curve  (Eq.  (14)  and  Fig. 
3).  If  this  is  done,  the  estimate  of  a is,  by  Eq. 
(16), 

R.  A.  Fisher,  Messenger  of  Mathematics  41,  155-160 
(1912). 

R.  A.  Fisher,  Proc.  Camb.  Phil.  Soc.  22,  700-725 
(1925):  26,  528-535  (1930);  28,  257-261  (1932). 


146 


W.  EDWARDS  DEMING  AND  RAYMOND  T.  BIRGE 


<r,  = 5V  {njlir)  5) 

-^sl{\-ijAn-Vi2n^ ) (38) 

and  by  introducing  this  into  Eq.  (13),  the  cor- 
responding estimate  of  the  probable  error  r is 

r,  = 0.674-  • -W  (1/27t)  |).  (39) 

We  shall  call  this  the  “mean  estimate  of  r” 
The  factors  multiplying  5 have  been  worked  out 
by  Lola  S.  Deming  for  n running  from  2 to  25 
and  are  shown  in  the  third  column  of  Table  IV 
Another  possibility  is  to  assume  that  if  more 
samples  were  to  be  drawn,  as  many  would  be 
found  with  S.D.  >s  as  have  S.D.  <s.  Geomet- 
rically this  is  the  same  thing  as  arbitrarily 
placing  the  observed  S.D.  at  the  median  s = a/f 
of  the  S.D.  frequency  curve;  hence  this  estimate 
of  (7  is  fs,  and  by  Eq.  (13)  it  leads  to 

= 0.674  ••  •/s/V«  = 4>5  (40) 

for  the  corresponding  estimate  of  r.  We  call 
this  the  “median  estimate  of  r.”  It  is  identical 
with  the  50  percent  fiducial  value  of  r.  It  will  be 
recalled  that  in  the  discussion  of  the  median, 
f was  defined  by  Eq.  (18),  and  that  values  of  1// 
and  / (or  /so)  have  been  shown  in  Tables  I and 

Table  IV.  Factors  that  multiply  s to  get  various  estimates  of 
the  probable  error  r of  n observations. 

The  “optimum  estimate,” 

r.  = 0.674- • -i/VCw-l).  (37) 

The  “mean  estimate,” 

r.  = 0.674---W(l/27r)  |).  (39) 

The  “median  estimate,” 

r,  = 0.674  ••  •/i/Vw  = <#>-t.  (40) 


n 

, 0.674... /V{n-1) 

0.674...  V(  1/2  7t) 
XB(U«-1).  i) 

4>  =0.674... //V» 

2 

0.674  4898 

0.845  3475 

1 

3 

.476  9363 

.538  1650 

0.572  8587 

4 

.389  4168 

.422  6738 

.438  5007 

5 

.337  2449 

.358  7766 

.368  1455 

6 

.301  6410 

.317  0053 

.323  3389 

7 

.275  3593 

.287  0213 

.291  6586 

8 

.254  9332 

.264  1711 

.267  7514 

9 

.238  4681 

.246  0183 

.248  8888 

10 

.224  8299 

.231  1497 

.233  5170 

11 

.213  2924 

.218  6829 

.220  6783 

12 

.203  3663 

.208  0348 

.209  7462 

13 

.194  7084 

.198  8026 

.200  2916 

14 

.187  0698 

.190  6985 

.192  0093 

IS 

.180  2650 

.183  5101 

.184  6755 

16 

.174  1525 

.177  0772 

.178  1222 

17 

.168  6224 

.171  2761 

.172  2201 

18 

.163  5878 

.166  0099 

.166  8682 

19 

.158  9788 

.161  2011 

.161  9858 

20 

.154  7386 

.156  7871 

.157  5083 

21 

.150  8205 

.152  7168 

.153  3825 

22 

.147  1857 

.148  9477 

.149  5648 

23 

.143  8017 

.145  4446 

.146  0186 

24 

.140  6408 

.142  1774 

.142  7132 

25 

.137  6796 

.139  1209 

.139  6225 

III  respectively.  The  factors  = 0.674 ••  •//>(« 
have  also  been  given  in  Table  III,  in  the  column 
headed  0so-  For  ready  comparison  between 
median,  optimum,  and  mean  estimates,  <p  is 
again  listed  in  Table  IV. 

There  are  other  possibilities  without  number. 
Only  two  more  will  be  mentioned.  One  is  to 
place  the  observed  S.D.  at  the  mode  (maximum) 
of  the  S.D.  frequency  curve;  this  leads  to 

r,  = 0.674•••^/(«-2)^  (41) 

which  may  be  called  the  “modal  estimate  of  r.” 
Another  is  to  assume  that  the  observed  is 
the  mean  square  of  all  the  standard  deviations 
that  would  be  obtained  fropi  a very  large  number 
of  samples.  It  is  a simple  matter  to  prove  by 
Helmert’s  equation  that  the  mean  square  of  the 
standard  deviation  in  a very  large  number  of 
samples  is  o-^(w  — l)/w.  Thus,  using  Helmert’s 
Eq.  (14), 

s^=  f s^(s/o-)”~^  exp  ( — ns^/2a^)ds  j 
^00 

I (5/0-)””^  exp  ( — w5^/2ct^)<75 

(w  + l\  / /w-l\ 

— j j Yy^^  = a\n-\)ln. 

■ . • (42) 

This  scheme  of  estimating  a brings  in  the  factor 
{n—\.)ln  and  therefore  leads  to  none  other  than 
the  optimum  value,  and  the  corresponding 
estimate  of  r is  identical  with  Eq.  (37). 

It  is  not  necessary  to  know  the  distribution 
of  standard  deviations  in  samples  in  order  to 
find  the  mean  square  standard  deviation ; it  can 
be  found  by  writing  Eq.  (9)  for  each  of  a large 
number  N oi  samples  of  n items  each,  and  adding 
the  N equations  so  obtained.  This  procedure 
gives 

N n N N 

11  I 1 

{\/Nn)^i:e?={\/N)Y.s^  + (\/N)^u\ 

11  1 1 

The  left-hand  side  of  the  last  equation  is  the  mean 
square  (true)  error  in  N samples,  and  is  therefore 
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(7*.  The  first  term  on  the  right  is  5^,  the  mean 
square  value  of  s,  and  the  last  term  is  the  mean 
square  error  of  the  means  of  the  N samples, 
which  by  Eq.  (12)  is  simply  <r^/n.  So  we  have 

a^  = s^-^a^/n 

or  _ 

s^  = a^{n  — \)ln, 

as  before.  This  is  the  method  adopted  in  some 
textbooks  for  the  derivation  of  Eq.  (37)^ 
Usually,  however,  the  texts  forget  to  warn  the 
reader  that  it  is  only  the  average  value  of  5*  that 
is  equal  to  — 1)/«;  the  S.D.  5 of  any  one 
sample  may  give  an  estimate  that  differs  con- 
siderably from  the  true  value.  Further,  the 
texts  usually  do  not  mention  the  fact  that  this 
is  only  one  of  many  possible  methods  of  esti- 
mating r. 

It  is  evident  from  a comparison  of  the  columns 
of  Tables  I and  IV  that  the  optimum,  mean, 
median  and  modal  estimates  are  approaching 
coincidence  as  n increases.  Table  I and  Fig.  4 
have  already  shown  that  the  mean,  median,  and 
mode  on  Helmert’s  curve  approach  <r  as  » 
increases,  and  that  the  values  of  ^ become 
restricted  practically  to  a very  small  range  near 
the  abscissa  5 = <r.  So  when  n is  very  large,  a 
may  be  equated  to  s[n/{n  — \)~\^,  or,  closely 
enough,  simply  to  s,  with  considerable  confidence. 

(4d).  Fluctuations  in  estimates.  The  r.m.s.  error 
in  an  estimate  of  r.  Significant  figures 
Estimates  of  r made  by  maximum  likelihood 
or  any  empirical  method  are  subject  to  the 


statistical  fluctuations  of  sampling.  Just  as  there 
is  no  way  of  judging  how  much  significance  dare 
be  attached  to  the  mean  x of  a sample  without 
knowing  the  r.m.s.  fluctuation  c/V « (i.e.,  the 
S.D.)  of  the  means  of  such  samples — or  what 
amounts  to  the  same  thing,  their  probable 
error  r — so  there  is  no  way  of  judgirig  the  sig- 
nificance of  an  estimate  of  <r  or  of  r without 
having  some  idea  of  the  r.m.s.  fluctuation  of 
such  estimates.  It  is  therefore  desirable  to  study 
the  precision  of  estimates  of  the  probable  error. 

Every  method  for  estimating  a from  the 
sample  alone  places 

a a = (43) 

where  w is  some  function  of  n that  approaches 
unity  as  n increases.  This  proposed  relation 
between  5 and  a gives,  from  Eq.  (13), 

rs  = 0.674- • = (44) 

for  the  corresponding  estimate  of  r.  The  error  in 
writing  Va  for  r is 

— r=  (0.674- • ■/V«)(w5  — <r).  (45) 

If  this  were  written. for  an  indefinitely  large 
number  of  samples,  the  mean  square  error  com- 
mitted would  be  the  average  of  {ra  — rY  taken 
over  all  samples.  Now  yds  in  Eq.  (14)  is  the 
number  of  samples  having  S.D.  ds,  so  with 
this  it  is  easy  to  write  down  the  contribution 
from  each  interval  ds  between  5 = 0 and  j= 
toward  the  sum  of  (r»  — r)^.  The  sum  of  all  these 
contributions  divided  by  N is  the  desired  average 
of  {ra  — ry-,  wherefore 


(r,-f)2  = (1/iV) (0.674- - -/V«)'  f {o^s-oYyds 

= (1/iV) (0.674- - -tr/V«)='  f {l-2o:s/a-\-oi^sy<T^)yds. 


The  three  integrations  that  arise  from  the  three  terms  in  the  parenthesis  correspond,  save  for  constant 
factors,  to  the  integrations  that  would  be  used  for  computing  the  zero,  first,  and  second  moments  of 
the  area  under  Helmert’s  curve,  Eq.  (14),  all  of  which  have  been  found.  The  zero  moment  is  of 
course  unity;  the  first  moment  or  mean  is  5 and  is  given  by  Eq.  (16);  and  the  second  moment  is 
s^=  a^{n—V) /n,  as  was  found  in  Eq.  (42).  Whereupon  it  follows  that 


and  that  the 


(r^  — r)^  = r^{  1 — 2co5/ o--|-w^(w  — 1)/«) , 


r.m.s.  error  in  writing  Ts  for  r 


l-2coV<r  + w2(w -!)/«!». 


(46) 


r 


(47) 
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For  convenience,  the  right-hand  member  of  this  equation  will  be  designated  by  the  letter  F. 

For  the  optimum  estimate  of  r,  w = [«/(w  — 1)3^  so  the  r.m.s.  error  in  the  classical  formula  Eq. 
(37)  is,  in  units  of  r, 

{2-2(J/cr)V[n/(«-l)]}^={2-2V[27r/(w-l)]/5[i(«-l),|]P 

->{2(«-l)!-i{l-l/16(w-l) }.  (48) 


The  gamma  functions  come  from  the  value  of  5 
given  in  Eq.  (16).  Helmert^^  published  this 
result  in  1876.  It  was  to  this  end  that  he  derived 
Eq.  (14). 

Eq.  (47)  gives  the  r.m.s.  error  of  any  estimate 
of  r in  fractional  parts  of  r.  But  when  r is  un- 
known, we  have  only  Tg,  and  this  increases  and 
decreases  with  w.  Hence  it  would  be  interesting 
to  express  the  r.m.s.  error  of  an  estimate  in 
units  of  Ts.  To  accomplish  this  it  is  only  necessary 
to  multiply  Eq.  (47)  through  by  r a j 
Accordingly  the 

r.m.s.  error  in  writing  for  r 

={aloiS)F,  (49) 

r, 

F being,  as  already  noted,  the  expression  in  w 
on  the  right-hand  side  of  Eq.  (47).  We  shall  call 
the  expression  {(j/ws)F,  just  derived,  the  pro- 
portional r.m.s.  error  in  and  shall  abbreviate 
it  “p. r.m.s.”  error.  It  has  its  minimum  value 
when  w = a/~s,  as  is  easily  found  by  equating  to 
zero  the  derivative  of  (o-/a)5)F  with  respect  to  w. 
This  result  shows  that  the  mean  estimate  of  r, 
given  in  Eq.  (39),  has  the  smallest  possible 
p.r.m.s.  error. 

Like  a and  r,  the  p.r.m.s.  error  (<r/cos)F  can 
only  be  estimated  from  a sample;  its  true  value, 
as  far  as  can  be  learned  from  the  sample,  remains 
unknown.  Now  it  so  happens  that  the  estimate 
of  {<t/us)F  is  simply  F,  since  the  estimate  of  a 
is  cTj,  and  <r«/ws  is  unity  by  Eq.  (43). 

We  therefore  have  shown  that 

F=\l-2ws/<T-ho3‘^{n-l)/n}i  (50) 

is  not  only  by  Eq.  (47)  the  r.m.s.  error  in  rj,  in 
units  of  r,  but  that  it  is  also  the  estimated  p.r.m.s. 
error  in  r^. 

To  get  the  estimated  p.r.m.s.  error  of  the 
optimum  (classical)  estimate  of  r,  we  put 
to={«/(»— l))i  in  the  expression  just  written 
for  F,  and  for  the  mean  estimate  we  put  co  = <r/s. 
The  numerical  values  of  Ffor  these  two  estimates 
are  given  in  Table  V for  n running  from  2 to  10. 


The  estimated  p.r.m.s.  error  F in  either  estimate 
of  r is  seen  to  be  roughly  25  percent  when  n = 9, 
and  it  increases  rapidly  as  n decreases.  Evi- 
dently, then,  an  estimate  of  a is  subject  to  rather 
violent  fluctuations  when  n is  very  small. 

In  the  last  column  of  Table  V are  shown  values 
of  l/[2(»— 1)3^  for  comparison  with  the  second 
and  third  columns.  Evidently  \/[_2{n  — \)~\^ 
comes  about  midway  between  the  optimum  and 
mean  values  of  F\  it  is  a little  larger  than  the 
former  and  a little  smaller  than  the  latter.  It  is 
perhaps  a good  enough  approximation  for  either 
estimate  even  down  to  w = 2 and  3,  since  little 
significance  can  be  attached  to  such  small  samples 
anyway.  The  values  of  F in  the  second  and  third 
columns  of  Table  V clearly  approach  those  of 
l/[2(«  — 1)3*  in  the  last  column.  It  should  be 
mentioned  that  Helmert  in  his  1876  paper”  gave 
a three-place  table  of  F for  the  optimum  estimate 
running  from  n = 2 to  w = 8,  and  compared  it 
with  l/[2(«— 1)3*. 

On  account  of  certain  considerations  arising 
from  the  notion  of  maximum  likelihood,  it  is 
probably  safe  to  say  that  when  an  estimate  of  r 
is  to  be  made  from  the  sample  alone,  there  is  no 
better  procedure  than  the  classical  one  of  using 
the  optimum  estimate,  Eq.  (37).  We  have  here 
discussed  other  ways  of  estimating  the  probable 

Table  V.  Values  of 

F=\\-2ors/a-\-w‘{n-\)/n\^  (50) 

for  the  optimum  (classical)  and  the  mean  estimates  of  the 
probable  error.  Comparison  with  1/ [2 (»  — !)]*.  For  the 
optimum  estimate,  03=  [«/(«  — !)]*.  For  the  mean  estimate, 

co  = o-/5  = V \nl2ir)  B[§(«  — 1),  I]. 


n 

F 

optimum 

F 

mean 

l/[2(«-l)]* 

2 

0.635  7915 

0.755  5106 

0.707  1068 

3 

.477  0180 

.522  7231 

.500  0000 

4 

.396  6920 

.422  0157 

.408  2483 

5 

.346  4517 

.362  9993 

.353  5534 

6 

.311  3427 

.323  2123 

.316  2278 

7 

.285  0656 

.294  1050 

.288  6751 

8 

.264  4600 

.271  6367 

.267  2612 

9 

.247  7471 

.253  6224 

.250  0000 

10 

.233  8406 

.238  7648 

.235  7023 

STATISTICAL  THEORY  OF  ERRORS 


149 


error  mainly  to  emphasize  the  fact  that  all  of 
them  are  subject  to  fluctuations  arising  from  the 
sampling  distribution  of  ^ as  given  by  Helmert 
in  Eq.  (14). 

If  n is  so  large  that  the  sampling  distribution 
of  5 (Helmert’s  Eq.  (14))  can  be  considered 
normal,  its  area  can  be  divided  into  quarters 
that  for  practical  purposes  are  symmetrically 
situated  about  the  mean.  An  estimate  of  r then 
has  a probable  error,  and  since  the  curve  is. 
normal,  this  probable  error  will  be  0.674- •• 
times  the  S.D.  or  the  r.m.s.  fluctuation.  But  we 
have  already  observed  from  Table  V that  the 
r.m.s.  errors  in  the  optimum  and  mean  estimates 
approach  l/[2(«  — 1)]^  and  when  n is  large 
enough  for  one  of  these  estimates  to  have  a 
probable  error,  any  of  the  other  possible  esti- 
mates that  have  been  considered  would  have 
practically  the  same  r.m.s.  error;  hence  we  can 
say  that  when  a probable  error  of  an  estimate 
of  the  probable  error  r exists,  its  estimated  value 
is  0.674-  - -/[2(«  — l)JVs,  that  is,  the 

estimated  probable  error  in 

= 0.674  - - - r«/[2 («  - 1)]5  = yr,/ V (w  - 1) , (51) 

7 having  the  value  0.674- - -/V  2 = 0.4769- • - as 
in  Eq.  (29).  This  is  often  loosely  called  “the 
probable  error  of  the  probable  error.”  Strictly, 
the  probable  error  r has  no  probable  error,  since 
it  is  a definite,  though“perhaps  unknown,  mag- 
nitude for  any  set  of  n observations.  The  esti- 
mate rs  made  from  the  sample  alone  does,  how- 
ever, always  have  a r.m.s.  error,  but  cannot  have 
a probable  error,  as  just  explained,  unless  n is  so 
large  that  the  distribution  of  5 is  practically 
normal.  This  condition  is  perhaps  approached 
closely  enough  when  « = 20,  but  of  course  no 
definite  line  can  be  drawn  there.  Now  either 
from  choice  or  circumstances,  20  is  about  as 
large  a number  of  observations  as  physicists  are 
in  the  habit  of  taking,  so  that  only  rarely  does 
an  estimate  of  r actually  have  a probable  error. 
It  therefore  seems  best  to  deal  exclusively 
with  the  estimated  p.r.m.s.  error  of  r,,  which 
has  been  designated  by  the  letter  F in  Eq. 
(47),  calculated  in  Table  V,  and  which  is  well 
enough  approximated  by  the  simple  expression 
l/[2(w  — 1)]L  Accordingly,  the  mean  of  n ob- 
servations, together  with  either  the  optimum  or 
the  mean  estimate  of  the  probable  error. 


should  then  be  written 

x±rs(l±l/[2(«—  l)]i). 

In  so  doing,  it  is  important  to  remember  that 
although  Ts  is  the  estimated  probable  error  in  x, 
the  quantity  1/[2(m  — 1)3^  is  the  estimated  pro- 
portional r.m.s.  error  in 

Only  when  n is  large  can  any  reasonable  degree 
of  belief  be  placed  in  an  estimate  of  r.  For  this 
reason  a statement  of  the  estimated  probable 
error  r,  is  by  itself  of  little  use;  we  require  also 
the  source  of  this  estimate  and  whether  it  be 
from  5 observations  or  from  25.  If  it  is  from  5 
observations  we  know  immediately  that  it  is 
subject  to  an  estimated  p.r.m.s.  error  of  over 
one-third  and  it  must  therefore  not  be  taken  too 
seriously.  One  way  of  overcoming  this  difficulty 
is  to  bring  in  prior  knowledge  by  the  methods 
to  be  outlined  later,  but  this  is  not  always 
feasible  nor  possible.  On  the  other  hand,  if  the 
estimate  is  made  from  25  observations,  some 
significance  can  be  attached  to  it.  In  publishing 
an  estimate  made  from  a sample  alone,  either  n 
or  the  estimated  p.r.m.s.  error  should  be  stated. 
Thus,  the  result  of  the  10  observations  made  on 
a micrometer,  previously  considered,  should  be 
written 

1.0760±0.0008(1±0.24) 

or 

1.0760±0.0008,  (10  observations). 

Either  line  conveys  the  information  that  the 
estimated  probable  error  is  subject  to  consider- 
able doubt.  The  estimated  p.r.m.s.  error  tells 
how  many  figures  are  significant  in  r^,  and  in 
turn  tells  how  many  are  significant  in  x.  A 
proper  appreciation  of  these  principles  is  essential 
when  correcting  data  for  systematic  errors,  or 
when  drawing  any  conclusion  from  experimental 
results. 

(4e).  The  posterior  method.  The  prior  and 

posterior  curves  for  a 

Estimates  of  cr  obtained  by  maximum  likeli- 
hood or  by  any  empirical  method  are  based  on 
the  sample  alone  and  hence  are  subject  to 
statistical  fluctuations.  They  take  no  account  of 
knowledge  concerning  <r  that  may  exist  in  varying 
amounts  before  the  sample  is  taken.  The  confi- 
dence that  any  one  places  in  an  estimate  made 
by  one  of  the  foregoing  devices  will  depend  in 
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some  manner  on  his  previously  formed  ideas 
concerning  the  range  in  which  cf  lies  and  on  how 
large  the  sample  is.  As  n is  indefinitely  increased, 
previous  experience  and  ideas  are  gradually  and 
unconsciously  relegated  into  insignificance. 

The  posterior  method  of  reasoning  combines 
prior  knowledge  with  the  information  contained 
in  the  sample.  It  is  applied  in  a qualitative  way 
quite  generally.  Everyone  who  thinks  to  himself, 
“This  result  seems  higher  (or  lower)  than  I had 
for  good  reasons  expected  to  find  it;  I wonder 
therefore  if  by  chance  it  is  not  too  high  (or  too 
low),”  is  combining  prior  knowledge  with  new 
information  provided  by  the  sample  and  is 
therefore  employing,  qualitatively,  the  posterior 
method. 

Prior^^  knowledge  concerning  a may  range 
from  none  at  all  to  the  ability  to  place  it  within 
very  narrow  limits.  As  an  example  of  the  latter 
situation  we  may  cite  cases  where  it  is  possible 
to  make  a long  series  of  measurements  (perhaps 
a hundred)  on  a single  magnitude.  The  S.D.  of 
this  long  series  multiplied  by  (100/99)^  may 
confidently  be  adopted  as  the  correct  value  of  a 
for  computing  the  probable  error  of  subsequent 
shorter  series  of  observations  made  with  the 
same  instrument  and  under  the  same  conditions. 
In  such  a situation,  the  value  of  cr  is  established 
so  definitely  that  the  S.D.  of  the  subsequent 
small  samples  need  not  be  computed  at  all,  and 
the  uncertainties  of  trying  to  estimate  a from 
each  one  of  them  alone  are  eliminated. 

At  the  other  extreme  stands  the  less  fortunate 
situation  where  nothing  at  all  is  known  regarding 
(j  and  where  there  is  no  hope  of  taking  a longer 
series  of  measurements  under  comparable  condi- 
tions in  order  to  establish  it.  Between  the  two 
extremes  come  more  or  less  hazy  notions,  often 
no  more  than  enough  to  state  wide  limits 
between  which  a must  lie.  At  other  times  the 
limits  may  be  narrower. 

These  notions  might  be  expressed  graphically 
in  a probability  curve,  to  be  called  a prior 
existence  curve,  so  drawn  that  the  area  between 
any  two  abscissas  is  the  probability  of  finding  a 

Prior  knowledge  can  sometimes  be  obtained  after  the  n 
observations  are  taken  as  well  as  before.  Our  adjectives 
relating  to  time  are  chosen  for  convenience  to  fit  the  usual 
descriptions  of  the  law  of  causality,  but  they  may  be 
changed  if  desired. 


Fig.  12.  Prior  and  posterior  estimates  of  a. 

<b(a)  VS.  (j.  The  prior  existence  curve  for  a shows  the  state 
of  knowledge  concerning  the  S.D.  of  the  parent  population 
before  a sample  is  drawn  from  it.  In  this  example,  o-  is 
known  to  lie  with  constant  probability  between  1 and  2. 

p{a)  vs.  a.  The  posterior  curve  for  a shows  the 

state  of  knowledge  concerning  the  S.D.  of  the  parent 
population  after  a sample  has  been  drawn  and  its  S.D. 
computed.  Here,  the  sample  was  found  to  have  a S.D. 
of  3/2.  The  probability  is  no  longer  constant  between  1 
and  2,  but  becomes  more  and  more  concentrated  about  the 
point  a = s as  n increases.  The  area  under  all  curves  is 
unity. 

between  them.  The  total  area  under  the  curve 
would  then  be  unity,  since  a must  lie  somewhere 
within  the  range  of  the  curve.  A simple  curve  is 
shown  in  Fig.  12.  Here  it  is  supposed  that  a is 
known  to  lie  somewhere  between  1 and  2,  and 
the  probability  that  it  lies  in  any  intermediate 
interval  is  proportional  to  the  width  of  that 
interval;  hence  the  curve  is  flat.  Such  a prior 
curve,  having  finite  discontinuities  at  <r=l  and 
a = 2,  would,  of  course,  never  be  used  in  practice, 
but  it  is  a convenient  one  mathematically  and 
so  will  serve  well  for  the  first  example. 

In  the  situation  where  a long  series  of  observa- 
tions has  provided  rather  definite  information, 
the  prior  existence  curve  would  have  nearly  all 
of  its  area  enclosed  in  a narrow  strip  centered  at 
the  S.D.  of  the  long  series.  The  exact  shape  of 
the  curve  over  this  short  interval  would  be 
unimportant. 

A horizontal  line  extending  from  very  small 
to  very  large  values  of  a and  including  unit  area 
with  the  (J  axis  implies  that  the  S.D.  of  the 
parent  population  has  equal  probability  in  equal 
ranges.  Such  a graph  might  seem  to  be  the 
appropriate  prior  existence  curve  in  the  absence 
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01  any  previous  knowledge  whatever  concerning 
the  precision  of  a set  of  observations;  but  if 
there  is  no  knowledge  concerning  cr,  then  there 
is  none  concerning  In  <r,  • • • ; and  if  the 

horizontal  line  expresses  ignorance  of  <r,  it  must 
also  express  ignorance  of  these  functions  of  a. 
But  if  a has  equal  probability  in  equal  ranges, 
then  <T^,  (T®,  In  (T,  • • • do  not  have  equal  proba- 
bilities in  equal  ranges  of  <r®,  In  cr,  • • •.  So  it 
appears  hazardous  to  attempt  to  express  mathe- 
matically a state  of  complete  ignorance  con- 
cerning cr.  Nevertheless,  Harold  Jeffreys^®  has 
argued  that  the  correct  procedure  in  such  cases 
is  to  make  the  ordinates  on  the  prior  existence 
curve  proportional  to  o-~^,  i.^.,  to  assume  that 
In  (T  is  uniformly  distributed.  Be  that  as  it  may, 
it  will  be  clear  later  that  when  the  prior  informa- 
tion is  so  hazy  that  there  is  difficulty  in  ex- 
pressing it,  the  posterior  method  is  affected  by 
the  statistical  fluctuations  of  small  samples 
nearly  as  much  as  the  estimates  made  by  maxi- 
mum likelihood  or  any  empirical  method,  and  so 
is  hardly  worth  the  effort.  Jeffreys’  curve  is  a 
special  case  of  one  introduced  by  Molina  and 
Wilkinson  in  1929,  which  will  be  studied  later. 

The  quantitative  application  of  the  posterior 
method  of  approaching  the  parent  population  is 
always  possible  by  Laplace’s  generalization  of 
Bayes’  theorem^^-®*  provided  the  state  of  prior 
knowledge  is  expressed  graphically  or  analyti- 
cally in  a prior  existence  curve.  The  process 
involves  only  simple  principles  in  the  theory  of 
probability. 

If  0(<r)  is  the  ordinate  on  the  prior  existence 
curve  at  the  abscissa  cr,  then  ^(<r)  da  is  the  prior 
existence  probability — the  probability  that  the 
S.D.  of  the  parent  population  lies  in  the  interval 
aii^da  according  to  the  state  of  knowledge 

Harold  Jeffreys,  Scientific  Inference,  Ch.  5 (Cambridge 
University  Press  1931);  Proc.  Roy.  Soc.  A138,  48-55  0932); 
Proc.  Camb.  Phil.  Soc.  29,  83-87  (1933);  Proc.  Roy.  Soc. 
A140,  523-534  (1933).  Jeffreys’  arguments  are  disputed  by 
R.  A.  Fisher,  Proc.  Roy.  Soc.  A139,  343-348  (1933). 

^‘'Thomas  Bayes,  Phil.  Trans.  Roy.  Soc.  S3,  370-418 
(1763). 

^ Laplace,  Theorie  Analytique  des  Prohabilites  (1812). 

Poisson,  Recherches  sur  la  Probabilite  des  Jusements 
(1837). 

” See  also  Edward  C.  Molina,  Bull.  Am.  Math.  Soc.  36, 
369-392  (1930);  Ann.  Math.  Stat.  2,  No.  1,  23-37  (1931). 

An  excellent  treatment  of  Laplace’s  generalization  of 
Bayes’  theorem  is  in  Ch.  5 of  Thornton  C.  Fry’s,  Probability 
and  Its  E?igineering  Uses  (Van  Nostrand,  1928).  See  also 
Ch.  6 in  Arne  Fisher’s  Mathematical  Theory  of  Proba- 
bilities (Macmillan,  second  edition  1922). 


existing  before  the  sample  was  drawn.  Now  if 
the  S.D.  of  the  parent  population  is  a,  the 
probability  of  drawing  a sample  having  the 
S.D.  sAz^ds  is,  by  Helmert’s  Eq.  (14),  const. 
Xa~^{s/aY~'^  exp  { — ns'^j2a^)  ds.  This  is  called  the 
prior  productive  probability  of  a.  The  probability 
that  the  S.D.  of  the  parent  population  lies  in  the 
interval  a±^da  and  that  the  S.D.  of  a sample  of 
n drawn  therefrom  will  lie  in  sAz^ds  is  the 
probability  of  a compound  event,  and  will 
therefore  be  proportional  to  the  product  of  the 
prior  existence  and  the  prior  productive  proba- 
bilities, namely, 

p da  ds  = const. 

X(l>{a)a~^{s/a)”'~^  exp  { — ns^jla'^)  da  ds.  (52) 

We  can  imagine  a surface  of  ordinate  p plotted 
on  the  orthogonal  axes  a and  5.  Let  us  take  a 
slab  of  thickness  ds  at  s,  parallel  to  the  p,a  plane. 
The  equation  of  the  curve  made  by  this  section  is 

p da  = const.  4>{a)a~^{s/a)’^~^  exp  { — ns'^/2a‘^)  da. 

p da  will-  be  proportional  to  the  posterior  proba- 
bility of  a,  which  is  the  name  given  to  the 
probability  that  the  S.D.  of  the  parent  popula- 
tion lies  within  the  interval  a±\da  after  the 
sample  is  drawn  and  found  to  have  S.D.  s.  The 
factor  of  proportionality  will  be  unity  if  the  area 
under  the  curve  is  unity,  as  it  will  be  if  the 
constant  is  properly  chosen.  This  is  insured  if 
the  last  equation  is  written 

4>{a)  a~^  {s/aY~‘^exp  { — ns-/2a‘'~) 
p da= da. 

4>{<c)  / (r)"“^  exp  { — ns~/2a^)da 

(53) 

When  the  constant  factor  in  the  equation  of  any 
probability  curve  is  so  chosen  that  the  total 
area  under  the  curve  is  unity,  the  equation  is 
said  to  be  “normalized,”  and  the  required 
constant  factor  is  called  the  “normalizing  factor.” 
It  simply  serves  to  identify  unity  with  certainty. 
As  in  the  equation  just  written,  the  process  of 
normalization  is. nearly  always  most  conveni- 
ently etccomplished  by  writing  a denominator 
identical  with  the  numerator,  and  then  inte- 
grating in  the  denominator  over  all  values  of 
the  variable  whose  probability  is  being  written. 

The  following  example  will  illustrate  the  use 
of  the  method  and  will  exhibit  some  of  its 
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features.  We  shall  suppose  that  before  any 
sample  is  drawn,  a is  known  to  lie  between  1 and 
2,  and  that  equal  intervals  are  equally  probable 
in  this  range.  Then  the  ordinates  of  the  prior 
existence  curve  will  be 


4>{cr)  = 0 0<cr<T 
4>(<t)  = 1 1 < cr  < 2 -• 

0(cr)  = 0 2 < cr 


(54) 


The  graph  is  shown  dashed  in  Fig.  12.  Now  let 
us  suppose  that  a sample  of  6 is  drawn  and 
that  its  S.D.  is  computed  and  found  to  be  1.5. 
Are  all  values  of  c between  1 and  2 equally 
probable  now?  The  posterior  curve  furnishes 
the  answer.  Its  ordinates  are  found  by  substi- 
tuting the  proper  values  of  0(tr),  n,  and  5 into 
Eq.  (53).  The  result  is 


p{cr)  = 0 0<(7<1 

exp  { — ns'^lla-'^) 

P((t)  da  = da 

f^a~'^{s! aY~'^  exp  { — ns~l2a-)da 

= 187.13cr“5  exp  (-27/4<r2)d<r 

\<a<2 

p{a)  = Q 2 < (T 


(55) 


The  normalizing  factor  187.13  was  obtained  by 
using  the  Tables  of  the  Incomplete  Gamma 
Function  to  evaluate  the  denominator  in  the 
preceding  line. 

Eq.  (55)  is  plotted  in  the  same  figure.  Instead 
of  being  flat,  the  posterior  curve  has  a maximum. 
Approximately  half  the  area  is  included  between 
the  abscissas  1.46  and  1.86,  so  the  location  of  a 
is  now  a little  more  definite  than  it  was.  The 
area  of  the  posterior  curve  would  be  more 
concentrated,  and  a more  definitely  located,  if 
the  prior  curve  had  had  a maximum  near  the 
middle  instead  of  being  flat. 

If  n had  been  24  instead  of  6,  the  equation  of 
the  posterior  curve  would  have  been 

p{a)  d(T  = 33.371  XIOV-23  exp  (-27/(7^)  da, 

\<a<2.  (56) 

This  is  also  shown  in  the  figure..  The  area  is 
now  much  more  concentrated  in  the  neighbor- 
hood of  the  maximum,  so  that  with  n = 24  we 


should  have  a much  better  idea  of  where  a 
actually  lies. 

The  posterior  method  fyrnishes  a probability 
curve  for  a by  changing  the  prior  existence  curve 
in  accordance  with  the  new  information  con- 
tained in  the  sample.  Before  the  sample  was 
drawn  the  probability  was  given  by  (\>{a) ; 
afterward,  by  p(a). 

The  shape  of  the  posterior  curve  changes  more 
or  less  as  s changes;  it  is  therefore  not  entirely 
free  from  the  statistical  fluctuations  of  sampling. 
Just  how  sensitive  it  is  to  variations  in  5 will 
depend  on  how  large  n is  and  on  how  definite 
the  prior  information  was;  as  one  would  expect, 
when  the  prior  curve  confines  a to  fairly  narrow 
limits  and  n is  not  large,  variations  in  5 have 
little  effect;  in  fact  if  the  prior  information  is 
extremely  definite,  a very  large  value  of  n will 
be  required  to  affect  noticeably  the  posterior 
curve  through  changes  in  s.  This  is  why  the 
value  of  a that  has  once  been  established  by 
means  of  a long  series  of  measurements  can  be 
used  for  subsequent  shorter  series;  the  standard 
deviations  of  these  shorter  series  need  not  be 
computed  at  all  because  their  influence  on  the 
posterior  curve  would  be  negligible.  However, 
if  the  prior  information  fixes  a only  loosely,  the 
sample  may  influence  the  posterior  curve  con- 
siderably, even  when  n is  small.  When  n is  large, 
the  posterior  curve  rises  to  a sharp  peak  at  the 
abscissa  provided  by  maximum  likelihood,  irre- 
spective of  the  shape  of  the  prior  curve.  Further- 
more, as  n increases,  the  fluctuations  in  5 become 
inappreciable.  It  is  therefore  correct  to  say  that 
a value  of  a can  be  established  by  taking  a long 
series  of  measurements. 

The  form  of  the  prior  existence  curve  shown 
in  Fig.  12  is  useful  for  illustration,  but  on 
account  of  its  discontinuities  it  lacks  some  of  the 
practical  features  of  the  curve  proposed  by 
Molina  and  Wilkinson  to  be  considered  in  a 
later  section. 

(4f).  Further  remarks  on  the  method  of  maxi- 
mum likelihood 

Before  leaving  the  prior  existence  curve  of 
Fig.  12  it  may  be  worth  while  to  examine  further 
the  position  of  the  maximum  of  the  resulting 
posterior  curve.  Starting  with  a flat  prior  ex- 
istence curve  like  that  in  Fig.  12,  the  maximum 
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will  always  come  at  the  abscissa  <r  = s\_nl {n  — 

This  result  arises,  of  course,  from  differentiating 
with  respect  to  <r  the  expression  for  p{(7)  in 
Eq.  (55),  holding  5 constant,  and  setting  the 
derivative  equal  to  zero.  The  resulting  relation 
between  a and  5 is  independent  of  the  denomi- 
nator, which  is  merely  a constant;  hence  this 
relation  is  independent  of  the  range  over  which 
the  prior  existence  curve  extends,  provided  only 
that  it  is  flat.  If  the  prior  existence  curve  for  a 
were  other  than  flat,  the  maximum  on  the 
posterior  curves  would  in  general  lie  elsewhere, 
because  p{(j)  da  would  no  longer  be  given  by  the 
right-hand  side  of  Eq.  (55)  nor  anything  pro- 
portional to  it,  but  would  be  given  by  Eq.  (53) 
wherein  4>{a)  would  not  be  a constant  but  some 
function  of  a. 

Now  the  position  of  the  maximum  (or  the 
mode)  on  the  posterior  curve  that  comes  from 
using  a flat  prior  existence  curve  for  a turns  out 
to  be  identically  the  same  relation  between  a 
and  5 as  was  obtained  in  Eq.  (36),  which  was 
arrived  at  in  our  search  for  the  parent  population 
that  is  most  favorable  (or  most  likely  in  Fisher’s 
sense)  to  the  S.D.  that  was  actually  observed  in 
the  sample.  It  will  be  recalled  that  we  arrived  at 
this  most  favorable  parent  population  by  differ- 
entiating Helmert’s  Eq.  (14)  with  respect  to  a 
and  setting  the  derivative  equal  to  zero;  also 
that  we  called  this  process  the  method  of  maxi- 
mum likelihood,  after  Fisher.  That  the  two 
results — the  position  of  the  maximum  on  the 
posterior  curve  and  the  application  of  the  method 
of  maximum  likelihood — must  be  identical  is 
evident  from  the  fact  that  when  the  prior 
existence  curve  for  a is  flat,  0(<r)  is  simply  a 
constant  and  the  right-hand  side  of  Eq.  (53) 
then  expresses,  save  for  a constant  factor,  the 
same  relation  between  a and  5 as  occurs  in 
Helmert’s  equation,  so  that  we  are  really 'differ- 
entiating the  same  function  in  both  cases. 

Because  of  this  coincidence,  the  method  of 
maximum  likelihood  has  often  been  described  as 
the  process  of  finding  the  mode  of  the  posterior 
curve  that  arises  from  a flat  prior  existence  curve. 
This  explanation,  although  it  masks  the  true 
nature  of  the  notion  of  maximum  likelihood, 
would  in  itself  do  no  harm  were  it  not  that  by 
implications  it  leads  to  misinterpretations.  Thus, 
as  has  been  pointed  out,  the  abscissa  of  the 


mode  of  the  posterior  curve  changes  as  the 
prior  existence  curve  changes,  and  the  particular 
abscissa  a = s\n / {n  — is  the  mode  of  the 

posterior  curve  in  general  only  when  the  prior 
existence  curve  for  <r  is  flat;  whence  such  an 
explanation  as  proposed  above  leads  innocently 
to  the  statement  that  the  method  of  maximum 
likelihood  is  a posterior  method  and  depends  on 
a uniform  (flat)  prior  existence  curve  for  the 
parameter  sought — in  our  case,  a.  But  if  we  had 
used  some  function  of  a such  as  a^,  In  <r,  • • • 
in  place  of  a as  the  equally  spaced  abscissas  along 
the  axis  of  the  prior  and  posterior  curves,  we 
should  likewise  have  found  that,  starting  with  a 
flat  prior  curve  for  <r^,  a^,  In  <r,  • • • as  the  case 
may  be,  the  relation  <r  = s[w/(w— 1)3^  is  not  that 
existing  at  the  mode  of  the  new  posterior  curve; 
whereupon  any  uniqueness  that  the  method  of 
maximum  likelihood  might  have  seemed  to 
possess  now  appears  to  have  been  an  illusion.  . 

The  resolution  of  the  difficulties  that  we  are 
led  to  by  such  an  explanation  lies  in  the  realiza- 
tion that  the  method  of  maximum  likelihood  is 
not  a posterior  method  at  all.  It  is  simply  a 
process  for  finding  the  parent  population  that  is 
most  favgrable  to  the  event  that  was  observed 
to  happen — in  our  case  a sample  having  S.D.  s. 
Obviously  the  answer  to  such  a problem  as 
finding  the  most  favorable  parent  population 
should  not,  in  fact  must  not,  depend  on  the 
choice  of  coordinates  nor  on  any  state  of  prior 
knowledge,  and  it  is  interesting  to  note  that  if 
Helmert’s  Eq.  (14)  be  expressed  in  terms  of  any 
function  of  a,  rather  than  in  terms  of  a itself, 
the  result  of  setting  the  derivative  with  respect 
to  a or  any  function  of  a equal  to  zero  is  always 
the  same  as  that  already  found  in  Eq.  (36), 
namely,  a = s\n/{n  — \)lf.  This  invariance  is  a 
general  property  of  the  method  of  maximum 
likelihood,  and  the  proof  is  very  simple;  if  the 
function  f{x),  continuous  in  any  interval,  be 
expressed  in  terms  of  v so  that  f{x)  — F{v)  and 
v = g(x)  over  that  interval,  we  shall  find  that  the 
values  of  x that  maximize  or  minimize  f(x) 
correspond  through  the  relation  v = g(x)  precisely 
to  the  values  of  v that  maximize  or  minimize 
F(v),  provided  dvfdx  is  neither  0 nor 

A graphical  illustration  of  the  meaning  of 
maximum  likelihood  is  provided  by  Fig.  13, 
which  shows  three  Helmert  curves  for  n = 6.  One 
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SCALE  FOR  S/Sq 


Fig.  13.  Curves  illustrating  the  meaning  of  maximum  likelihood.  A sample  of  n is  drawn,  and 
its  S.D.  proves  to  be  So.  The  S.D.  <r  of  the  parent  population  can  be  anything  between  0 and  « , 
blit  the  value  <r  = JoV  (w/(w— 1))  is  “most  likely,”  for  this  gives  the  greatest  possible  ordinate 
at  5 = 5o  on  the  frequency  distribution  curve  for  the  S.D.  of  samples  of  n (Helmert’s  equation). 
The  curves  illustrate  that  the  ordinate  at  s = So  is  higher  when  <r  = ioV  («/(w— 1))  than  when 
<r  = So  or  (T  = 5oV  («/(w  — 2)).  With  the  latter  value,  the  mode  of  the  curve  cornea  at  s = So.  The 
equation  of  the  curves  is 

y as  — - 


in  which  n has  been  placed  equal  to  6. 

curve  is  plotted  with  the  maximum  likelihood 
value  of  a,  namely,  a = Sa\_n/{n  — \)'y‘,  So  being 
the  S.D.  observed  in  a sample  of  n;  and  the 
other  two  are  plotted  with  slightly  less  and 
slightly  greater  values  of  <r.  At  s = So,  or  at 
V^o=  1,  the  ordinate  along  the  curve  having  S.D. 
a = So[n/{n  — \)'}^  is  clearly  greater  than  the 
ordinates  of  the  two  other  curves.  This  fact 
illustrates  that  out  of  the  infinity  of  parent 
populations  that  the  sample  could  have  come 
from,  that  having  the  maximum  likelihood  value 
of  a is  most  favorable,  since  it  gives  the  greatest 
possible  ordinate  at  5 = 5o  and  therefore  maxim- 
izes the  probability  of  drawing  a sample  of  S.D. 

(4g).  The  posterior  method,  continued.  The 
probability  curve  of  the  unknown  mean, 
and  the  calculation  of  the  posterior 
quartile  deviation 

One  particular  value  of  <r  gives  the  u,s  fre- 
quency surface  that  was  studied  in  previous 
sections.  A u,s  frequency  surface  having  its  total 


volume  equal  to  unity  but  made  up  of  contribu- 
tions from  several  values  of  a would  be  a com- 
posite surface.  Its  sections  would  no  longer  be 
the  u and  s curves  that  were  studied,  since  all 
values  of  a under  the  prior  existence  curve  for  <r 
make  their  contributions  to  the  volume  according 
to  their  relative  probabilities,  which  are  desig- 
nated by  the  ordinates  <^>(<r). 

To  make  the  posterior  method  complete,  it  is 
necessary  to  consider  also  the  prior  existence 
curve  for  the  mean  ii  of  the  parent  population. 
The  prior  curve  for  y.,  as  well  as  that  for  a,  will 
have  its  effect  on  the  composite  surface. 

We  may  take  sections  j = const,  on  this  com- 
posite u,s  surface,  just  as  before,  but  such 
sections  will  not  now  be  normal  curves  as  they 
were  with  the  simple  surface.  We  shall  assume 
that  they  are  symmetrical,  however;  and  we 
shall  define  the  “posterior  quartile  deviation” 
r,  to  be  the  absolute  magnitude  of  the  u abscissas 
that  divide  an  5 section  symmetrically  into 
quarters.  Sometimes,  if  not  always,  these  ab- 
scissas r,  will  vary  as  the  s coordinate  of  the 
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section  varies,  whereas  with  the  simple  u,s 
surface  the  abscissas  ±r  cut  all  s = const,  sections 
symmetrically  into  quarters. 

Mathematically  manageable  forms,  allowing 
sufficient  freedom  for  any  degree  of  prior  knowl- 
edge likely  to  be  encountered,  have  been  intro- 
duced by  Molina  and  Wilkinson^®  for  the  prior 
existence  probabilities  of  the  mean  and  the 
S.D.  <T  of  the  parent  population.  They  are 

1 

<j>((x)dff  = <^(T,  (57) 

2^^r(ic+l)a 

r 1 

e{n)dtx  = A\\+ — ( ) dtx.  (58) 

L l+a^/ras^V  5 / J 

a,  b and  c are  adjustable  constants.  A can  easily 
be  found  by  setting  yi*  0(m)  1,  but  its  value 

will  not  be  needed. 

Graphs  of  Eq.  (57)  with  c = 3 and  c=10  are 
shown  in  Fig.  14.  They  are  skew  curves;  the 
mode  comes  at  a/(c+3)^  and  the  mean  at 
[o/V  (27t)J5(^(c-(- 1),  I).  The  <r  axis  is  tangent 
to  the  curves  at  0 and  <»,  where  it  makes  high 
order  of  contact,  so  extremely  small  and  ex- 
tremely large  values  of  <r  are  always  excluded. 
The  larger  c is,  the  narrower  is  the  range  in 
which  the  greater  part  of  the  area  is  confined. 
The  two  constants  a and  c permit  whatever 
concentration  of  area  happens  to  fit  the  state  of 
prior  knowledge  and  also  permit  the  mean  or 
mode  of  the  curve  to  be  placed  at  will.  It  will  be 
noticed  that  if  a = 0 and  c=— 2,  Molina  and 
Wilkinson’s  prior  curve  for  a reduces  to  the  one 
proposed  by  Jeffreys,^^  namely,  (/>(<t)  = const.  o-~\ 
and  that  if  o = 0 and  c = — 3,  we  obtain  the  flat 
prior  existence  curve  = const. 


Fig.  14.  Molina  and  Wilkinson’s  prior  existence  curve 
for  <r. 

da  (57) 

a and  c are  arbitrary  constants.  The  area  included  between 
any  two  abscissas  is  the  prior  probability  that  a lies  within 
that  interval.  The  total  area  under  each  curve  is  unity. 
The  curves  here  drawn  with  c = 3 and  c=10  show  that 
increasing  values  of  c correspond  to  increasingly  definite 
prior  knowledge  concerning  the  S.D.  of  the  parent  popu- 
lation. 

The  prior  curve  for  the  mean  is  of  the  Student 
type  (see  Fig.  5).  It  is  symmetrical  about  the 
mean  x of  the  sample,  so  when  b>0  this  curve 
implies  that  the  mean  of  the  sample  is  a pruri 
to  be  preferred  as  the  mean  of  the  parent 
population.  When  6 = 0,  the  curve  is  flat  from  0 
to  00,  meaning  that  equal  ranges  from  — oo  to 
-foo  are,  a priori,  equally  probable.  This  is  the 
most  conservative  value  of  6. 

If  M and  a were  known,  the  probability  of 
drawing  a sample  with  S.D.  and  with 

mean  at  xAz\dx  would  be  given  immediately  by 
Eq.  (11),  and  can  be  written 


y{x,s)  dx  ds  = Ca~'^{s / aY~'^  exp  \_  — ns^/2<A  — n(Jc  — ny/2a^~\dxds.  (59) 

This  is  the  prior  productive  probability  of  n and  a. 

The  posterior  probability  of  n and  <r,  i.e.,  the  probability  that  the  mean  and  S.D.  of  the  parent 
population  lie  in  the  ranges  n±hdn  and  a±\da  while  the  mean  and  S.D.  of  the  sample  lie  in  the 
ranges  x±^dx  and  is  given,  except  for  the  normalizing  factor,  by  the  product  of  the  prior 

existence  and  prior  productive  probabilities  as  expressed  in  Eqs.  (57),  158),  (59).  Finally,  integration 
of  this  product  over  all  possible  values  of  a gives  the  posterior  probability  of  namely 


ydix  = 


y’o”  e{ti)  (p{<r)  y{x,s)  da 

^ 

dfJt  din)  y{x,s)  da- 


rn 


E.  C.  Molina  and  R.  I.  Wilkinson,  Bell  Syst.  Tech.  J.  8,  632-645  (1929). 
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This  is  the  probability,  after  the  sample  of  mean  x and  S.D.  5 has  been  drawn,  that  the  mean  of  the 
parent  population  lies  in  the  interval  yL±\dix.  As  usual,  the  denominator  is  simply  the  normalizing 
factor.  The  constant  C in  Eq.  (59)  cancels,  so  its  value  need  not  be  determined. 

The  integrations  with  respect  to  a in  this  fraction  are  easily  performed  when  the  prior  existence 
probability  functions  0(m)  and  have  the  forms  suggested  by  Molina  and  Wilkinson.  The  result  is 


y djx  = - 


1 


5V  {\-\-a^/ns‘^)B 


L l+0-/w5^V  5 / J 


-Kn+2+c+6) 


dfx 


(61) 


for  the  posterior  probability  of  fi. 

It  is  here  convenient  to  replace  the  error  x — ^ by  its  usual  symbol  u,  dix  by  —du,  and  to  denote 
W + 2+C+6  by  T and  the  entire  resulting  expression  by  —q{u)  du.  Then 

1 r 1 

q{u)du  = Id du  (62) 


is  the  posterior  probability  curve  for  the  error  u 
when  the  S.D.  of  the  sample  is  5. 

This  is  the  equation  for  a section  5 = const,  on 
the  composite  u,s  frequency  surface  formed  by 
the  contributions  of  all  values  of  a in  the  assumed 
d>{a)  distribution.  The  posterior  quartile  devia- 
tion, previously  defined  is  then  given  by  the 
integral 


since  it  must  divide  the  jr  = const,  curve  sym- 
metrically into  quarters.  The  value  of  r,  will 
then  be  expressed  by 

r,  = 5/(l+a^/n5“)2  (64) 

where  t is  a function  of  T only,  and  satisfies 


T = n-\-l,-{-c-\-b. 

The  integral  by  which  t is  determined  is  of 
the  Student  type;  in  fact  t is  just  the  value  of  f 
given  by  Table  II  when  the  n in  that  table  is 
replaced  by  If  the  integral  were  equated  to 

The  value  of  T to  be  used  in  Table  II  must  not  be 
confused  with  the  actual  number  of  items  n in  the  sample. 
T and  n are  numerically  the  same  only  when  2-|-c-l-i)  = 0, 
as  Eq.  (65)  shows.  In  the  prior  existence  function  assumed 
by  Jeffreys  (footnote  33),  c=— 2 and  b = Q,  and  this 
relation  is  satisfied.  Since  Jeffreys  also  assumed  c = 0 we 


0.80,  0.90,  and  0.9973,  the  corresponding  limits 
would  determine  the  posterior  80,  90,  and  99.73 
percentile  deviations.  These  can  be  denoted  by 
r,(80),  r,(90),  r,(99.73).  The  posterior  probable 
error,  or  50  percent  error,  could  be  denoted  by 
rg(50),  but  unless  emphasis  is  desired  it  will 
usually  be  written  simply  as  r,. 

Curves  showing  t as  a function  of  T for  the 
four  values  of  the  integral  of  Eq.  (65)  are  shown 
in  Fig.  15.  The  ordinates  for  the  50  percent 
curve  come  from  Table  II ; the  others  were  kindly 
furnished  by  Molina  and  Wilkinson.  They  show 
a similar  chart  in  their  paper.  The  procedure  is 
very  simple  after  the  constants  a,  b,  and  c are 
settled  upon.  It  is  only  necessary  to  find  t for 
the  abscissa  T=n-\-2-\-c-\-bhy  means  of  Fig.  15; 
then  to  compute  r,  by  Eq.  (64). 

It  is  interesting  now  to  notice  certain  features 
in  the  results  that  have  been  obtained.  In  Fig. 
15  the  ordinates  for  large  values  of  T drop  off 
more  and  more  slowly  with  increase  in  T,  so 
when  n is  large,  t is  not  very  sensitive  to  changes 
in  n,  b,  and  c.  Hence  as  ^ increases  indefinitely, 
t approaches  coincidence  with  f regardless  of  b 
and  c.  Further,  as  «— > <» , o^/w5^->0  and  1 +0^/ ns'^ 
-^1;  therefore  r^-^st^s^,  which  in  turn  ap- 

have  from  Eq.  (64)  the  further  interesting  relation  that 
Tq  = ls  = ^s.  Thus  when  0(<r)  = const. /<r,  the  posterior 
quartile  deviation  is  numerically  equal  to  what  may  be 
called  “Student’s  50  percent  error”  (see  Table  II  and 
Fig.  10c  and  the  accompanying  discussion).  It  should  be 
emphasized,  however,  that  this  is  a mere  numerical 
coincidence  and  that  the  two  quantities  r,  and  fs  have 
very  different  theoretical  meanings. 
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Fig.  15.  Chart  for  using  Molina  and  Wilkinson’s  prior  existence  curves.  The  ordinate  t on  any  curve  multiplied  by 
Syj  (1+a^/wi^)  gives  the  indicated  posterior  percentile  deviation  of  u.  The  abscissa  7’=«+2+6+c;  » = number  in  sample; 
a,  b,  c are  constants  used  in  fitting  Molina  and  Wilkinson’s  curves  to  the  prior  knowledge. 
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proaches  r as  a statistical  limit.  Thus  the  true 
probable  error  will  be  attained  as  the  sample  is 
indefinitely  increased,  irrespective  of  the  prior 
information,  for  the  constants  a,  b,  c then  have 
negligible  influence. 

When  n is  small,  the  situation  is  different,  for 
the  value  of  t,  and  hence  that  of  r,,  will  depend 
considerably  on  b and  c.  Also  if  the  term 

a^lns^  in  Eq.  (64)  will  be  important  on  account 
of  its  stabilizing  action  for  it  will  prevent  r,  from 
fluctuating  as  widely  as  5 does.  But  if  a = 0,  the 
term  a^lns^  will  be  absent,  and  r,  will  be  pro- 
portional to  5,  and  will  therefore  fluctuate  with  s. 
This  is  the  situation  in  Jeffreys’  assumption.®^’ 

The  significance  of  r,(50)  is  that  according  to 
our  knowledge  and  beliefs  concerning  ju  and  a, 
derived  from  all  sources  including  the  sample, 
we  are  willing  to  lay  even  odds  that  \u  \ $rq. 
The  significances  of  rg(80),  r,(90),  and  r,(99.73) 
are  similar  except  that  the  odds  are  80  : 20, 
90  : 10,  and  99.73  : 0.27  that  \u  \ <r,(80),  r,(90), 
and  rg(99.73),  respectively. 

Tg  is  not  the  probable  error  of  the  mean  of  n 
observations,  nor  is  it  an  estimate  of  the  probable 
error,  any  more  than  fs  is.  r,  simply  provides 
another  statistical  relation;  it  differs  from  in 
that  by  taking  account  of  prior  information  it  is 
not  subject  to  fluctuations  to  the  same  degree 
as  5 and  ^s.  It  is  interesting  to  note  that  Molina 
and  Wilkinson®®  made  21  different  assumptions 
regarding  the  prior  existence  curves  for  n and  <r 
and  thereby  obtained  21  different  values  for  the 
posterior  quartile  deviation  Tg.  For  a sample  of 
n = 5 the  highest  and  lowest  of  these  values  of 


y dsidsi  ■ • ■ dsm  = const. exp  I 

^ni+n2+*  • • + y 

is  the  probability  that  the  S.D.  of  the  m series 
will  lie  in  the  m ranges  Si±|d5i,  S2±^ds2,  •••, 
Sm±hdSm  while  their  means  lie  anywhere  between 
— CO  and  +CO.  a is  the  same  for  all  sets  since 
we  are  assuming  that  all  the  observations  are 
made  under  the  same  conditions  as  far  as 


Tg  are  closely  in  the  ratio  2:1,  which  shows  that 
prior  information  may  have  considerable  in- 
fluence on  Tg  when  n is  small. 

(4h).  The  estimation  of  a from  several  samples 

We  have  seen  that  a value  of  <r  can  be  estab- 
lished by  taking  a long  series  of  measurements 
on  a particular  magnitude;  if  s is  the  S.D.  of  this 
long  series,  we  may  with  considerable  confidence 
estimate  a to  be  s[n/{n  — \)~\^{\±\/\_2{n  — \)~\^). 
If  n is  large  the  estimated  p.r.m.s.  error 
l/[2(w  — 1)J'  will  be  small  and  the  effect  of  prior 
knowledge  will  be  negligible.  We  may  then  use 
this  value  for  cr  in  calculating  the  probable  error 
of  subsequent  shorter  series  of  observations 
made  under  similar  conditions. 

Unfortunately  it  is  not  always  practicable  nor 
possible  to  take  a long  series  of  measurements  in 
order  to  establish  a value  of  a.  Oftentimes, 
however,  there  do  exist  records  of  many  short 
series  of  observations,  all  presumably  made 
under  approximately  the  same  conditions  and 
therefore  all  with  practically  the  same  precision. 
In  such  cases  it  is  desirable  to  have  a method 
for  estimating  <r  from  these  several  sets  of 
observations. 

Let  there  be  «i  observations  on  the  mean  fx\,  ti2 
on  the  mean  jj.2,  • • ■ , Wm  on  the  mean  Hm-  Let  the 
means  of  these  m series  of  observations  be  in 
error  by  the  amounts  Mi,  M2,  • • • , Um,  and  let  their 
S.D.  be  5i,  S2,  •••,  Sm-  By  writing  down  the 
probabilities  of  the  occurrence  of  errors  and 
residuals  after  the  manner  of  the  development  of 
Helmert’s  equation  it  is  not  difficult  to  see  that 

Ml5i®-fM2^2^+  • • • 

|cf5ir752- • -(ii,,,  (66) 

2(7®  / 


precision  is  concerned.  It  is  cr  that  is  to  be 
estimated.  To  accomplish  this  we  can  apply  the 
method  of  maximum  likelihood — that  is,  differ- 
entiate the  above  expression  with  respect  to  cr, 
set  this  derivative  equal  to  zero,  and  solve  for 
(7.  The  result  is 


Wi5i®  + M2S2^+  • ■ • +nmSj-  Wl  + W2d l"«m 


Mi+M2H \-n,n~m 


«i+W2H Vrim  — m 


(67) 
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where 

niSi^  + ri2Si^-\ 

= (68) 

ni+n2-\ Vrim 

s as  here  defined  is  just  the  S.D.  that  would  be 
calculated  for  the  entire  lot  of  W1+W2+  •••+??„ 
observations  if  each  series  of  observations  were 
held  rigid  with  respect  to  its  own  mean  and  the 
m sample  means  moved  into  coincidence. 

Eq.  (67)  gives  the  optimum  estimate  of  cr, 
found  from  the  m series  of  observations.  Its 
estimated  p.r.m.s.  error  is  very  closely  l/[2(wi 
+W2+ • • • +Wm  — w)(]^  which  of  course  reduces 
to  l/[2(»— 1)]^  for  a single  set,  as  has  already 
been  found  in  Table  V.  This  optimum  estimate, 
together  with  its  estimated  p.r.m.s.  error  is  then 
statistically  more  reliable  than  an  estimate  made 
from  any  one  of  the  individual  series  of  observa- 
tions that  make  up  the  entire  lot;  it  is  also 
statistically  more  reliable  than  an  estimate  from 
a subsequent  short  series  of  measurements  yet 
to  be  made  under  the  same  conditions.  We 
should  therefore  not  bother  to  compute  the  S.D. 
of  subsequent  short  series,  but  should  rather 
calculate  their  probable  errors  immediately  by 
Eq.  (13)  using  therein  the  more  reliable  estimate 
of  a that  comes  from  Eq.  (67).  There  is,  of 
course,  no  reason  why  the  S.D.  of  any  short 
series  should  not  be  combined  with  previous  ones 
to  get  a still  more  reliable  estimate  if  such  a 
course  seems  advisable,  and  it  should  be  noted 
that  the  form  of  the  middle  member  of  Eq.  (67) 
is  such  that  this  is  very  easy  to  accomplish. 
The  point  that  we  wish  to  emphasize  is  that 
the  S.D.  of  short  series  should  not  be  used  by 
themselves  if  there  is  any  way  to  avoid  doing  so. 

An  interesting  special  case  is  where  measure- 
ments are  made  in  duplicate.  Here  Wi  = W2  = Ws 
= ---=Wm  = 2,  and  m,  the  number  of  items 
measured,  is  equal  to  |(«i+W2-T  ••• +»»£).  Eq. 
(67)  then  reduces  to 

<r8^=  (^i^+52^H (69) 

for  the  optimum  estimate  of  a.  The  S.D.  of  any 
pair  of  measurements  is  obviously  just  half  the 
difference  between  the  pair.  Now  any  single  pair 
of  measurements  constitutes  a sample  of  2 and 
is  by  Table  V almost  useless  for  estimating  cr, 
but  if  several  hundred  items  have  been  measured 
in  duplicate,  the  pairs  of  observations  can  be 


combined  and  used  in  Eq.  (69)  to  get  a fairly 
reliable  estimate,  since  the  r.m.s.  error  of  this 
estimate  will  be  l/(2w)^. 

As  an  example  in  the  use  of  Eq.  (67)  we  take 
20  samples  of  5 each  from  the  500  readings  on  a 
spectral  line  that  were  made  by  one  of  us.*-^ 
The  fact  that  all  these  sets  of  5 readings  were 
observations  on  a single  magnitude  rather  than 
on  distincfmeans  uu  M2,  • • • , M20  is  of  no  conse- 
quence in  the  application  of  Eq.  (67) ; there  is 
in  fact  an  advantage  for  purposes  of  illustration 
in  having  the  500  readings  all  on  the  same 
magnitude,  because  after  we  estimate  cr  by  means 
of  Eq.  (67)  from  the  20  samples  of  5 each  we 
shall  have  for  comparison  the  still  more  reliable 
estimate  obtained  from  the  entire  500.  The  20 
samples  of  5 each  were  made  up  from  the  500 
observations  in  the  following  way:  Readings 
No.  1,  11,  21,  31,  41  constitute  the  first  sample, 
readings  No.  51,  61,  71,  81,  91  constitute  the 
second  sample,  • • •,  readings  No.  451,  461,  471, 
481,  491  constitute  the  tenth,  readings  No.  2, 
12,  22,  32,  42  constitute  the  eleventh,  readings 
No.  52,  62,  72,  82,  92  the  twelfth,  etc.  The  S.D. 
and  individual  estimates  of  cr  made  by  both  the 
optimum  and  mean  formulas  (Eqs.  (37)  and 
(39))  are  shown  in  Table  VI.  Here  ni  = «2  = ws 
= • • • = 5 and  m — 20.  With  the  squares  of  the 
S.D.  in  the  second  column  Eq.  (67)  then  gives 

5 X 1336+5  X1976H [-5X1464 

cr,2  = XlO-8 

5 + 5H 1-5-20 

1336  + 1976H hl464 

= Xl0-«  = 1517X10-*, 

16 

ct8  = 0.00389. 

Here  the  estimated  p.r.m.s.  error  is  l/[2(wi+W2 
+ • • • +Wm  — m)]'  = 1/V  160  = 0.079,  so  we  write 

ct8  = 0.0039(1  ±0.08).  (70) 

The  averages  (r.m.s.  and  arithmetic)  of  the 
optimum  and  mean  estimafes  in  the  fourth  and 
fifth  columns  of  Table  VI  compare  very  favorably 
with  this  result,  but  it  is  interesting  to  see  how 
the  individual  estimates  in  these  same  columns 
fluctuate.  Until  the  estimate  of  cr  written  in  Eq. 
(70)  has  been  displaced  by  a still  better  one,  the 
probable  error  of  the  mean  x of  any  one  of  the 
20  series  of  5 observations  each,  or  indeed  of 
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Table  W.  An  estimate  of  <r  made  from  20  samples  of  5 each. 
Comparison  with  the  optimum  and  mean  estimates  of 
<r  made  from  the  individual  samples. 

By  Eq.  (37)  the  optimum  estimate  of  a is  s[n/{n—\)ff 
= l.llSOj  when  n = S. 

By  Eq.  (39)  the  mean  estimate  of  a is  s{n/2ir)^  B{\{n  — 1),  |) 
= 1.1894J  when  w = S. 


Sample 

No. 

(S.D.)2  = i2 
mm^ 

mm 

Estimates  of  <r,  in  mm 
Optimum  Mean 

xio- 

xio- 

XIO- 

xio- 

1 

1336 

36.55 

40.87 

43.47 

2 

1976 

44.45 

49.70 

52.87 

3 

0936 

30.59 

34.21 

36.39 

4 

0256 

16.00 

17.89 

19.03 

5 

0896 

29.93 

33.47 

35.60 

6 

1064 

32.62 

36.47 

38.80 

7 

0704 

26.53 

29.66 

31.56 

8 

0200 

14.14 

15.81 

16.82 

9 

0544 

23.32 

26.08 

27.74 

10 

1056 

32.50 

36.33 

38.65 

11 

3944 

62.80 

70.21 

74.70 

12 

0256 

16.00 

17.89 

19.03 

13 

3384 

58.17 

65.04 

69.19 

14 

2296 

47.92 

53.57 

56.99 

15 

0800 

28.28 

31.62 

33.64 

16 

0704 

26.53 

29.66 

31.56 

17 

0400 

20.00 

22.36 

23.79 

18 

0776 

27.86 

31.14 

33.13 

19 

1280 

35.78 

40.00 

42.55 

20 

1464 

38.26 

42.78 

45.51 

Average 

32.41* * 

38.95** 

38.55+* 

The  optimum  estimate  of  <r  made  from  the  20  samples  of  5 
each  is  found  from  Eq.  (67): 


, ni5i^+n2^2^H 

° »i+n2+ • ■ ■ +n„  — m 

5Xl336+SX1976+----h5Xl464_ 

5+5+---+5-20  ^ 

= 13364_1976+-+1464x  i517  X 10- 

16 

O',  = 0.00389  mm. 

* arithmetic  mean. 

**  root  mean  square. 


any  subsequent  5 observations  taken  under  the 
same  conditions,  should  be  written  as 

[0.674-  • • X0.0039/V5](l±0.08) 

= 0.0013(1±0.08),  (71) 

which  makes  use  of  the  estimate  of  a furnished 
by  the  20  samples  rather  than  by  any  individual 
sample  of  5. 

In  this  particular  example  we  have  at  hand 
400  more  readings,  since  we  have  used  only 


100  ( = 20X5)  so  far.  When  a is  estimated  from 
the  entire  500  the  result  is 

(T3  = 0.003583(500/499)5[1±1/V2(500-1)] 

= 0.00359(1  ±0.032).  (72) 

The  figure  0.003583  is  the  S.D.  of  the  500 
readings.  The  factor  (500/499)^  is  hardly  neces- 
sary, since  n is  so  large.  The  previous  estimate 
of  a furnished  by  Eq.  (70)  and  used  in  Eq.  (71) 
should  now  be  replaced  by  the  estimate  in  Eq. 
(72).  In  practice  we  are  generally  not  so  fortunate 
as  to  have  a series  of  500  observations  from 
which  to  estimate  a but  must  instead  be  content 
to  combine  several  small  samples  by  the  method 
of  Eq.  (67);  indeed,  more  often  the  estimate  of 
<r  must  be  made  from  a single  small  sample.  In 
such  a case,  Eq.  (67)  reduces  to  Eq.  (37),  the 
use  of  which  has  been  discussed  earlier. 

§5.  Conclusion 

So  far,  we  have  dealt  with  methods  for  laying 
odds  on  the  error  of  the  mean  of  a single  sample. 
The  error  of  the  mean  has  referred  throughout 
the  paper  to  the  difference  between  the  mean  of 
the  n observations  in  the  sample  and  what  the 
mean  would  be  if  n were  indefinitely  increased. 
We  have  therefore  considered  only  accidental 
errors.  As  was  stated  at  the  beginning  of  the 
paper,  no  amount  of  analysis  of  a single  sample, 
regardless  of  how  large  it  is,  can  of  itself  lead 
one  to  suspect  the  presence  of  constant  errors. 

The  parent  population  of  errors,  and  any 
sample  therefrom,  is  one  of  accidental  errors 
only.  The  mean  of  the  parent  population  is  not 
necessarily  the  true  value  of  the  thing  being 
measured ; it  is  displaced  by  an  amount  equal  to 
the  sum  of  all  the  constant  errors  that  happen 
to  be  operating.  Only  by  considering  several  sets 
of  observations  (samples)  from  different  arrange- 
ments of  apparatus  or  from  different  laboratories, 
but  supposedly  made  on  the  same  unknown 
magnitude  or  on  the  same  function,  can  sta- 
tistical tests  indicate  the  presence  of  constant 
errors. 

A large  portion  of  the  work  that  has  been 
done  in  mathematical  statistics  during  the  last 
few  years  has  been  directed  toward  the  problem 
of  several  samples,  or  toward  the  more  general 
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problem  presented  by  observations  on  points  in 
the  plane  or  in  space  when  the  true  coordinates 
would  supposedly  satisfy  a given  functional 
relation.  Statistical  methods,  together  with  the 
necessary  tables  and  charts  for  facilitating  com- 
putation, have  been  devised  from  the  results  of 
recent  advances  in  theory  for  getting  a quanti- 
tative answer  to  the  important  question  of  how 
well  or  how  poorly  a proposed  law  of  physics  is 
substantiated  by  experiment.  This  question,  as 
far  as  statistics  goes,  is  closely  related  to  the 
detection  of  constant  errors. 

The  theory  and  the  method  for  handling 
several  samples  is  a more  general  problem,  but 
not  necessarily  a more  difficult  one,  than  the 
treatment  of  a single  sample.  In  order  that  safe 
conclusions  may  be  drawn  from  several  series  of 
observations,  it  is  essential  that  each  series 
receive  correct  statistical  treatment,  or  none  at 
all.  It  follows  that  although  a single  sample 
cannot  by  itself  lead  to  the  detection  of  constant 


errors  either  with  correct  or  incorrect  treatment, 
the  statistics  of  a single  sample  must  form  the 
background  for  the  interpretation  of  several 
samples.  The  present  paper  is  the  result  of  an 
attempt  to  gather  the  elements  of  the  statistics 
of  a single  sample  into  one  place  for  ready 
reference,  in  order  to  promote  the  study  of 
general  methods  for  the  interpretation  of  ob- 
servational data. 
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